实体注释在 RASA NLU 中有空格

Question

{
  "text": "show me chinese restaurants",
  "intent": "restaurant_search",
  "entities": [
    {
      "start": 8,
      "end": 15,
      "value": "chinese",
      "entity": "cuisine"
    }
  ]
}

子字符串 Chinese 被标记为从话语的第 8 到第 15 个索引的实体。

我写了一个C#小程序来验证话语中字符索引的正确性。

public class Program
    {
        public static void Main(string[] args)
        {
            string s = "show me chinese restaurants";
            int i = 0;

            foreach(var item in s.ToCharArray())
                Console.WriteLine("{0} - {1}", item, i++);
        }
    }

但是当我运行程序时，我得到以下输出：

s - 0
h - 1
o - 2
w - 3
  - 4
m - 5
e - 6
  - 7
c - 8
h - 9
i - 10
n - 11
e - 12
s - 13
e - 14
  - 15
r - 16
e - 17
s - 18
t - 19
a - 20
u - 21
r - 22
a - 23
n - 24
t - 25
s - 26

请注意文本注释的奇怪行为，子字符串 Chinese 从索引 8 开始，到 15 以空格结束。

但是子字符串 Chinese 应该从索引 8 开始到位置 14 结束。

当我使用从位置 8 开始到 14 结束的索引训练相同的文本 Chinese 时。我收到 RASA 发出的 Misaligned Entity Annotation 警告，详情 here。

谁能解释一下这种奇怪的行为。

谢谢

Answer 1

阅读 link provided 我可能想出了一个可能的解释：

which together make a python style range to apply to the string, e.g. in the example below, with text="show me chinese restaurants", then text[8:15] == 'chinese'

这让我走上了我所想的道路

Hmmm that is weird i wonder if python does indexing wierdly

我启动了一个快速应用程序来证明这一点：

text = "show me chinese restaurants"
print(text[8:15])

现在这可能没有意义，因为这里数组的 space 15 中的字符实际上是一个 space。这让我看到这篇文章：

https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python/

他们在这里的示例中使用的运算符似乎 text[8:15] 对数组进行切片，他们使用示例：

a = [1, 2, 3, 4, 5, 6, 7, 8]

a[1:4] 输出：[2, 3, 4]

并这样解释

Let me explain it. The 1 means to start at second element in the list (note that the slicing index starts at 0). The 4 means to end at the fifth element in the list, but not include it. The colon in the middle is how Python's lists recognize that we want to use slicing to get objects in the list.

所以好像切片的第二个参数是独占的

希望对您有所帮助

p.s。必须学习和设置一些 python 东西 :D

实体注释在 RASA NLU 中有空格

Entity annotation has whitespaces in RASA NLU

c#

nlp

rasa-nlu