Facebook 的小鸭无法正确识别时间维度

Facebook's Duckling Cannot Identify Time Dimension Correctly

我正在使用 Facebook's Duckling 来解析文本。当我传递文本时:13h 47m 它正确地将整个文本分类为 DURATION(= 13 小时 47 分钟)。

但是,当我传递文本时:13h 47m 13s 它无法将字符串的 13s 部分识别为 DURATION 的一部分。我原以为它会将其解析为 13 hours, 47 minutes and 13 seconds 但它实际上忽略了 13s 部分,因为它不属于 DURATION.

Command: curl -XPOST http://127.0.0.1:0000/parse --data locale=en_US&text="13h 47m 13s"
JSON Array: 
[
  {
    "latent": false,
    "start": 0,
    "dim": "duration",
    "end": 7,
    "body": "13h 47m",
    "value": {
      "unit": "minute",
      "normalized": {
        "unit": "second",
        "value": 49620
      },
      "type": "value",
      "value": 827,
      "minute": 827
    }
  },
  {
    "latent": false,
    "start": 8,
    "dim": "number",
    "end": 10,
    "body": "13",
    "value": {
      "type": "value",
      "value": 13
    }
  }
]

这是一个错误吗?我如何更新 Duckling 以便它按上述方式解析文本?

文档对此似乎很清楚:

To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:

  • Duckling/<Dimension>/<Lang>/Rules.hs
  • Duckling/<Dimension>/<Lang>/Corpus.hs
  • Duckling/Dimensions/<Lang>.hs (if not already present in Duckling/Dimensions/Common.hs)
  • Duckling/Rules/<Lang>.hs

Duckling/Duration/Rules.hs 中查看,我看到:

ruleIntegerUnitofduration = Rule
  { name = "<integer> <unit-of-duration>"
  , pattern =
    [ Predicate isNatural
    , dimension TimeGrain
    ]
  -- ...

所以接下来我查看了 Duckling/TimeGrain/EN/Rules.hs(因为 Duckling/TimeGrain/Rules.hs 不存在),然后看到:

grains :: [(Text, String, TG.Grain)]
grains = [ ("second (grain) ", "sec(ond)?s?",      TG.Second)
         -- ...

大概这意味着 13h 47m 13sec 将按照您想要的方式进行解析。为了使 13h 47m 13s 以相同的方式解析,我想我首先要尝试的是使上面的正则表达式更宽松一些,可能类似于 s(ec(ond)?s?)?,然后看看这样做是否可以解决问题打破你关心的任何其他东西。