如何使 Microsoft LUIS 区分大小写？

Question

我有一个用于 NLP 的 Azure LUIS 实例，尝试使用 RegEx 表达式提取字母数字值。它运行良好，但输出以小写字母输出。

例如：

案例 1*

我的输入：“运行 AE0002 的工作”RegExCode = [a-zA-Z]{2}\d+

输出：

{
  "query": " run job for AE0002",
  "topScoringIntent": {
    "intent": "Run Job",
    "score": 0.7897274
  },
  "intents": [
    {
      "intent": "Run Job",
      "score": 0.7897274
    },
    {
      "intent": "None",
      "score": 0.00434472738
    }
  ],
  "entities": [
    {
      "entity": "ae0002",
      "type": "Alpha Number",
      "startIndex": 15,
      "endIndex": 20
    }
  ]
}

我需要保持输入的大小写。

案例 2

我的输入："Extract only abreaviations like HP and IBM" RegExCode = [A-Z]{2,}

输出：

{
  "query": "extract only abreaviations like hp and ibm", // Query accepted by LUIS test window
  "query": "extract only abreaviations like HP and IBM", // Query accepted as an endpoint url
  "prediction": {
    "normalizedQuery": "extract only abreaviations like hp and ibm",
    "topIntent": "None",
    "intents": {
      "None": {
        "score": 0.09844558
      }
    },
    "entities": {
      "Abbre": [
        "extract",
        "only",
        "abreaviations",
        "like",
        "hp",
        "and",
        "ibm"
      ],
      "$instance": {
        "Abbre": [
          {
            "type": "Abbre",
            "text": "extract",
            "startIndex": 0,
            "length": 7,
            "modelTypeId": 8,
            "modelType": "Regex Entity Extractor",
            "recognitionSources": [
              "model"
            ]
          },
          {
            "type": "Abbre",
            "text": "only",
            "startIndex": 8,
            "length": 4,
            "modelTypeId": 8,
            "modelType": "Regex Entity Extractor",
            "recognitionSources": [
              "model"
            ]
          },....          
          {
            "type": "Abbre",
            "text": "ibm",
            "startIndex": 39,
            "length": 3,
            "modelTypeId": 8,
            "modelType": "Regex Entity Extractor",
            "recognitionSources": [
              "model"
            ]
          }
        ]
      }
    }
  }
}

这让我怀疑整个训练是否以小写形式进行，令我震惊的是所有最初训练到各自实体的单词都被重新训练为 Abbre

任何输入都会有很大帮助:)

谢谢

Answer 1

您可以简单地使用输出中提供的单词索引从输入字符串中获取值，与提供的值完全一样。

{
  "query": " run job for AE0002",
  ...
  "entities": [
    {
      "entity": "ae0002",
      "type": "Alpha Number",
      "startIndex": 15,
      "endIndex": 20
    }
  ]
}

收到此回复后，在查询中使用 substring 方法，使用 startIndex 和 endIndex（如果您的方法需要长度，则使用 endIndex - startIndex，而不是结束索引），以便获得您要查找的值。

Answer 2

对于案例 1，您是否需要保留案例以便在您的系统上查询作业？只要作业标识符始终包含大写字符，您就可以使用 toUpperCase()，例如var jobName = step._info.options.entities.Alpha_Number.toUpperCase()（不确定 Alpha Number 中的下划线，我以前从未有过带空格的实体）。

对于案例 2，这是 LUIS 应用程序的缺点。您可以使用 (?-i) 在正则表达式中强制区分大小写（例如 /(?-i)[A-Z]{2,}/g）。但是，LUIS 似乎首先将所有内容都转换为小写，因此您永远不会得到与该语句的任何匹配（这比匹配每个单词要好，但这并不能说明什么！）。我不知道有什么方法可以使 LUIS 以您请求的方式识别实体。

您可以创建一个包含您期望的所有缩写的列表实体，但根据您期望的输入，维护起来可能太多了。再加上也是单词的缩写会被识别为误报（例如 CAT 和 cat）。您还可以编写一个函数在 LUIS 之外为您执行此操作，基本上是构建您自己的手动实体检测。可能会有一些额外的解决方案，具体取决于您在确定缩写后尝试执行的操作。

如何使 Microsoft LUIS 区分大小写？

How to make Microsoft LUIS case sensitive?

nlp

microsoft-cognitive

botframework

azure-language-understanding