Facebook 的小鸭无法正确识别时间维度
Facebook's Duckling Cannot Identify Time Dimension Correctly
我正在使用 Facebook's Duckling 来解析文本。当我传递文本时:13h 47m
它正确地将整个文本分类为 DURATION
(= 13 小时 47 分钟)。
但是,当我传递文本时:13h 47m 13s
它无法将字符串的 13s
部分识别为 DURATION
的一部分。我原以为它会将其解析为 13 hours, 47 minutes and 13 seconds
但它实际上忽略了 13s
部分,因为它不属于 DURATION
.
Command: curl -XPOST http://127.0.0.1:0000/parse --data locale=en_US&text="13h 47m 13s"
JSON Array:
[
{
"latent": false,
"start": 0,
"dim": "duration",
"end": 7,
"body": "13h 47m",
"value": {
"unit": "minute",
"normalized": {
"unit": "second",
"value": 49620
},
"type": "value",
"value": 827,
"minute": 827
}
},
{
"latent": false,
"start": 8,
"dim": "number",
"end": 10,
"body": "13",
"value": {
"type": "value",
"value": 13
}
}
]
这是一个错误吗?我如何更新 Duckling 以便它按上述方式解析文本?
文档对此似乎很清楚:
To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:
Duckling/<Dimension>/<Lang>/Rules.hs
Duckling/<Dimension>/<Lang>/Corpus.hs
Duckling/Dimensions/<Lang>.hs
(if not already present in Duckling/Dimensions/Common.hs
)
Duckling/Rules/<Lang>.hs
在 Duckling/Duration/Rules.hs
中查看,我看到:
ruleIntegerUnitofduration = Rule
{ name = "<integer> <unit-of-duration>"
, pattern =
[ Predicate isNatural
, dimension TimeGrain
]
-- ...
所以接下来我查看了 Duckling/TimeGrain/EN/Rules.hs
(因为 Duckling/TimeGrain/Rules.hs
不存在),然后看到:
grains :: [(Text, String, TG.Grain)]
grains = [ ("second (grain) ", "sec(ond)?s?", TG.Second)
-- ...
大概这意味着 13h 47m 13sec
将按照您想要的方式进行解析。为了使 13h 47m 13s
以相同的方式解析,我想我首先要尝试的是使上面的正则表达式更宽松一些,可能类似于 s(ec(ond)?s?)?
,然后看看这样做是否可以解决问题打破你关心的任何其他东西。
我正在使用 Facebook's Duckling 来解析文本。当我传递文本时:13h 47m
它正确地将整个文本分类为 DURATION
(= 13 小时 47 分钟)。
但是,当我传递文本时:13h 47m 13s
它无法将字符串的 13s
部分识别为 DURATION
的一部分。我原以为它会将其解析为 13 hours, 47 minutes and 13 seconds
但它实际上忽略了 13s
部分,因为它不属于 DURATION
.
Command: curl -XPOST http://127.0.0.1:0000/parse --data locale=en_US&text="13h 47m 13s"
JSON Array:
[
{
"latent": false,
"start": 0,
"dim": "duration",
"end": 7,
"body": "13h 47m",
"value": {
"unit": "minute",
"normalized": {
"unit": "second",
"value": 49620
},
"type": "value",
"value": 827,
"minute": 827
}
},
{
"latent": false,
"start": 8,
"dim": "number",
"end": 10,
"body": "13",
"value": {
"type": "value",
"value": 13
}
}
]
这是一个错误吗?我如何更新 Duckling 以便它按上述方式解析文本?
文档对此似乎很清楚:
To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:
Duckling/<Dimension>/<Lang>/Rules.hs
Duckling/<Dimension>/<Lang>/Corpus.hs
Duckling/Dimensions/<Lang>.hs
(if not already present inDuckling/Dimensions/Common.hs
)Duckling/Rules/<Lang>.hs
在 Duckling/Duration/Rules.hs
中查看,我看到:
ruleIntegerUnitofduration = Rule
{ name = "<integer> <unit-of-duration>"
, pattern =
[ Predicate isNatural
, dimension TimeGrain
]
-- ...
所以接下来我查看了 Duckling/TimeGrain/EN/Rules.hs
(因为 Duckling/TimeGrain/Rules.hs
不存在),然后看到:
grains :: [(Text, String, TG.Grain)]
grains = [ ("second (grain) ", "sec(ond)?s?", TG.Second)
-- ...
大概这意味着 13h 47m 13sec
将按照您想要的方式进行解析。为了使 13h 47m 13s
以相同的方式解析,我想我首先要尝试的是使上面的正则表达式更宽松一些,可能类似于 s(ec(ond)?s?)?
,然后看看这样做是否可以解决问题打破你关心的任何其他东西。