查找 table 在 Rasa NLU 的训练数据中不起作用
Lookup table not working in training data of Rasa NLU
我有特定意图的示例也显示了实体,我希望模型能够识别可能是该特定意图的实体的其他词,但它无法识别它。
## intent: frequency
* what is the frequency of [region](field)?
* what's the frequency of[region](field)?
* frequency of [region](field)?
* [region](field)s frequency?
* [region](field) frequency?
* frequency [region](field)?
## lookup: field
* price
* phone type
* region
所以当我输入文本时 "What is the frequency of region?" 我得到了输出
{'intent': {'name': 'frequency', 'confidence': 0.9517087936401367},
'entities': [{'start': 17, 'end': 23, 'value': 'region',
'entity': 'field', 'confidence': 0.9427971487440825,
'extractor': 'CRFEntityExtractor'}], 'text': 'What is the frequency of region?'}
但是当我输入文本时 "What is the frequency of price?" 我得到了输出
{'intent': {'name': 'frequency', 'confidence': 0.9276150465011597},
'entities': [], 'text': 'What is the frequency of price?'}
根据 RasaNLU 文档,为了使查找工作,您需要包含一些来自查找的示例 table。
此外,您需要了解 "phone type" 和 "region" 是不同的模式,因为 "phone type" 有两个词,而 "region" 是一个词。请记住这一点,我已将您的数据集扩展为
## intent: frequency
* what is the frequency of [region](field)?
* what is the frequency of [city](field)?
* what is the frequency of [work](field)?
* what's the frequency of [phone type](field)?
* what is the frequency of [phone type](field)?
* frequency of [region](field)?
* frequency of [phone type](field)?
* [region](field)s frequency?
* [region](field) frequency?
* frequency [region](field)?
现在,当我尝试您提到的所有示例时,即使 "price" 未包含在数据集中,但模式都已涵盖,它们仍然有效。
Enter a message: What is the frequency of price?
{
"intent": {
"name": "frequency",
"confidence": 0.966820478439331
},
"entities": [
{
"start": 25,
"end": 30,
"value": "price",
"entity": "field",
"confidence": 0.7227365687405007,
"extractor": "CRFEntityExtractor"
}
]
}
我建议使用 https://github.com/rodrigopivi/Chatito 来生成简单的数据集,它会让你的事情变得更容易,并自动生成同义词等。
此外,以防万一您不知道您还可以使用文件指向大型查找,例如
## lookup:city
data/lookups/city_lookup.txt
在 config.yml
中使用以下管道
管道:
- 名称:WhitespaceTokenizer
- 名称:RegexFeaturizer
- 名称:CRFEntityExtractor
- 名称:LexicalSyntacticFeaturizer
- 名称:CountVectorsFeaturizer
- 名称:CountVectorsFeaturizer
分析器:“char_wb”
min_ngram: 1
max_ngram: 4
- 名称:DIETClassifier
entity_recognition:错误
纪元:100
- 名称:EntitySynonymMapper
- 名称:响应选择器
纪元:100
我有特定意图的示例也显示了实体,我希望模型能够识别可能是该特定意图的实体的其他词,但它无法识别它。
## intent: frequency
* what is the frequency of [region](field)?
* what's the frequency of[region](field)?
* frequency of [region](field)?
* [region](field)s frequency?
* [region](field) frequency?
* frequency [region](field)?
## lookup: field
* price
* phone type
* region
所以当我输入文本时 "What is the frequency of region?" 我得到了输出
{'intent': {'name': 'frequency', 'confidence': 0.9517087936401367},
'entities': [{'start': 17, 'end': 23, 'value': 'region',
'entity': 'field', 'confidence': 0.9427971487440825,
'extractor': 'CRFEntityExtractor'}], 'text': 'What is the frequency of region?'}
但是当我输入文本时 "What is the frequency of price?" 我得到了输出
{'intent': {'name': 'frequency', 'confidence': 0.9276150465011597},
'entities': [], 'text': 'What is the frequency of price?'}
根据 RasaNLU 文档,为了使查找工作,您需要包含一些来自查找的示例 table。
此外,您需要了解 "phone type" 和 "region" 是不同的模式,因为 "phone type" 有两个词,而 "region" 是一个词。请记住这一点,我已将您的数据集扩展为
## intent: frequency
* what is the frequency of [region](field)?
* what is the frequency of [city](field)?
* what is the frequency of [work](field)?
* what's the frequency of [phone type](field)?
* what is the frequency of [phone type](field)?
* frequency of [region](field)?
* frequency of [phone type](field)?
* [region](field)s frequency?
* [region](field) frequency?
* frequency [region](field)?
现在,当我尝试您提到的所有示例时,即使 "price" 未包含在数据集中,但模式都已涵盖,它们仍然有效。
Enter a message: What is the frequency of price?
{
"intent": {
"name": "frequency",
"confidence": 0.966820478439331
},
"entities": [
{
"start": 25,
"end": 30,
"value": "price",
"entity": "field",
"confidence": 0.7227365687405007,
"extractor": "CRFEntityExtractor"
}
]
}
我建议使用 https://github.com/rodrigopivi/Chatito 来生成简单的数据集,它会让你的事情变得更容易,并自动生成同义词等。
此外,以防万一您不知道您还可以使用文件指向大型查找,例如
## lookup:city
data/lookups/city_lookup.txt
在 config.yml
中使用以下管道管道:
- 名称:WhitespaceTokenizer
- 名称:RegexFeaturizer
- 名称:CRFEntityExtractor
- 名称:LexicalSyntacticFeaturizer
- 名称:CountVectorsFeaturizer
- 名称:CountVectorsFeaturizer 分析器:“char_wb” min_ngram: 1 max_ngram: 4
- 名称:DIETClassifier entity_recognition:错误 纪元:100
- 名称:EntitySynonymMapper
- 名称:响应选择器 纪元:100