LUIS.ai json 已迁移到 Rasa 格式 json 未返回实体,但返回了正确的意图
LUIS.ai json migrated to Rasa format json is not returning the entities but correct intent is returned
我已使用以下命令将从 LUIS 应用程序下载的 json 迁移到 RASA 格式:python -m rasa_nlu.train -c config_spacy.json
我的配置文件如下所示:
{
"path" : "./models",
"data" : "./data/examples/rasa/BookACab.json",
"pipeline" : ["nlp_spacy", "tokenizer_spacy", "intent_featurizer_spacy",
"ner_crf", "ner_synonyms", "intent_classifier_sklearn",
"ner_duckling"]
}
使用 json 以 RASA 格式生成了一个模型,如下所示。但是,当我使用
查询此模型时
http://localhost:5000/parse?q=book a ride later
返回与我输入的文本及其所有相关实体相关的正确高分意图。但是当我尝试另一个文本时:
http://localhost:5000/parse?q=I want to go ride today 5pm
返回的意图是正确的,但它的实体对象是空的。正如您在下面看到的 json,这个话语也有实体映射到它,类似于工作示例。
请帮助我了解这是否是每个使用 RASA 的人都会遇到的问题,还是我做错了什么?谢谢!
{
"rasa_nlu_data": {
"common_examples": [
{
"entities": [
{
"entity": "RideTime",
"value": "later",
"start": 0,
"end": 5
}
],
"intent": "None",
"text": "later"
},
{
"entities": [],
"intent": "ServiceRequestEnquiry",
"text": "wake up"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "no not now"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "not sure"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "no bot"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "no goride bot"
},
{
"entities": [
{
"entity": "RideTime",
"value": "later",
"start": 12,
"end": 17
}
],
"intent": "BookCab",
"text": "book a ride later"
},
{
"entities": [
{
"entity": "RideTime",
"value": "now",
"start": 21,
"end": 24
}
],
"intent": "BookCab",
"text": "i want go for a ride now"
},
{
"entities": [
{
"entity": "RideTime",
"value": "today",
"start": 12,
"end": 17
}
],
"intent": "BookCab",
"text": "book a ride today"
},
{
"entities": [
{
"entity": "RideTime",
"value": "today 5pm",
"start": 18,
"end": 27
}
],
"intent": "BookCab",
"text": "I want to go ride today 5pm"
},
{
"entities": [
{
"entity": "RideTime",
"value": "today",
"start": 12,
"end": 17
}
],
"intent": "BookCab",
"text": "book a ride today 5pm"
},
{
"entities": [
{
"entity": "RideTime",
"value": "later",
"start": 13,
"end": 18
}
],
"intent": "BookCab",
"text": "book shuttle later"
},
{
"entities": [
{
"entity": "RideTime",
"value": "now",
"start": 15,
"end": 18
}
],
"intent": "None",
"text": "i want to book now"
},
{
"entities": [
{
"entity": "RideTime",
"value": "booknow",
"start": 10,
"end": 17
}
],
"intent": "None",
"text": "i want to booknow"
},
{
"entities": [
{
"entity": "RideTime",
"value": "book later",
"start": 10,
"end": 20
}
],
"intent": "None",
"text": "i want to book later"
}
],
"regex_features": []
}
}
如果您可以包含 pipeline you are using with Rasa. You can find this in your configuration file. Assuming you haven't changed the default pipeline in config_spacy.json
then you're using ner_crf 用于实体识别,将会很有帮助。
很可能由于库的差异,Rasa 只需要比 LUIS 需要更多的训练数据。定性地,mitie
管道通常需要较少的训练数据,但代价是训练需要更多时间。
所以你的问题的基本答案是:如果你想使用ner_crf那么你需要增加你为实体识别提供的训练数据量。
也就是说:RideTime 是您唯一的实体吗?如果是这样,您应该考虑将 ner_duckling 添加到可以识别日期的管道中。这比您尝试自己训练日期的效果更好。
所以使用上面的训练数据和管道:
["nlp_spacy", "tokenizer_spacy", "intent_featurizer_spacy", "ner_crf", "ner_synonyms", "intent_classifier_sklearn", "ner_duckling"]
结果如下:
{
"entities": [
{
"additional_info": {
"grain": "hour",
"others": [
{
"grain": "hour",
"value": "2017-07-26T17:00:00.000Z"
}
],
"value": "2017-07-26T17:00:00.000Z"
},
"end": 27,
"entity": "time",
"extractor": "ner_duckling",
"start": 18,
"text": "today 5pm",
"value": "2017-07-26T17:00:00.000Z"
}
],
"intent": {
"confidence": 0.5469262356494486,
"name": "BookCab"
},
"intent_ranking": [
{
"confidence": 0.5469262356494486,
"name": "BookCab"
},
{
"confidence": 0.2812606328712321,
"name": "None"
},
{
"confidence": 0.08727531874740564,
"name": "ConfirmationNo"
},
{
"confidence": 0.0845378127319134,
"name": "ServiceRequestEnquiry"
}
],
"text": "I want to go ride today 5pm"
}
这个完整的训练集对我来说效果很好。这只是添加更多训练示例的问题。因此,当您进行更多测试时,如果您遇到一个无法按预期工作的示例,请将其添加到训练数据中并重新训练。从而教会您的模型处理更多不同的请求。
https://gist.github.com/wrathagom/7f05fbda75c785977bd07cd89e62ddd7
我已使用以下命令将从 LUIS 应用程序下载的 json 迁移到 RASA 格式:python -m rasa_nlu.train -c config_spacy.json
我的配置文件如下所示:
{
"path" : "./models",
"data" : "./data/examples/rasa/BookACab.json",
"pipeline" : ["nlp_spacy", "tokenizer_spacy", "intent_featurizer_spacy",
"ner_crf", "ner_synonyms", "intent_classifier_sklearn",
"ner_duckling"]
}
使用 json 以 RASA 格式生成了一个模型,如下所示。但是,当我使用
查询此模型时http://localhost:5000/parse?q=book a ride later
返回与我输入的文本及其所有相关实体相关的正确高分意图。但是当我尝试另一个文本时:
http://localhost:5000/parse?q=I want to go ride today 5pm
返回的意图是正确的,但它的实体对象是空的。正如您在下面看到的 json,这个话语也有实体映射到它,类似于工作示例。
请帮助我了解这是否是每个使用 RASA 的人都会遇到的问题,还是我做错了什么?谢谢!
{
"rasa_nlu_data": {
"common_examples": [
{
"entities": [
{
"entity": "RideTime",
"value": "later",
"start": 0,
"end": 5
}
],
"intent": "None",
"text": "later"
},
{
"entities": [],
"intent": "ServiceRequestEnquiry",
"text": "wake up"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "no not now"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "not sure"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "no bot"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "no goride bot"
},
{
"entities": [
{
"entity": "RideTime",
"value": "later",
"start": 12,
"end": 17
}
],
"intent": "BookCab",
"text": "book a ride later"
},
{
"entities": [
{
"entity": "RideTime",
"value": "now",
"start": 21,
"end": 24
}
],
"intent": "BookCab",
"text": "i want go for a ride now"
},
{
"entities": [
{
"entity": "RideTime",
"value": "today",
"start": 12,
"end": 17
}
],
"intent": "BookCab",
"text": "book a ride today"
},
{
"entities": [
{
"entity": "RideTime",
"value": "today 5pm",
"start": 18,
"end": 27
}
],
"intent": "BookCab",
"text": "I want to go ride today 5pm"
},
{
"entities": [
{
"entity": "RideTime",
"value": "today",
"start": 12,
"end": 17
}
],
"intent": "BookCab",
"text": "book a ride today 5pm"
},
{
"entities": [
{
"entity": "RideTime",
"value": "later",
"start": 13,
"end": 18
}
],
"intent": "BookCab",
"text": "book shuttle later"
},
{
"entities": [
{
"entity": "RideTime",
"value": "now",
"start": 15,
"end": 18
}
],
"intent": "None",
"text": "i want to book now"
},
{
"entities": [
{
"entity": "RideTime",
"value": "booknow",
"start": 10,
"end": 17
}
],
"intent": "None",
"text": "i want to booknow"
},
{
"entities": [
{
"entity": "RideTime",
"value": "book later",
"start": 10,
"end": 20
}
],
"intent": "None",
"text": "i want to book later"
}
],
"regex_features": []
}
}
如果您可以包含 pipeline you are using with Rasa. You can find this in your configuration file. Assuming you haven't changed the default pipeline in config_spacy.json
then you're using ner_crf 用于实体识别,将会很有帮助。
很可能由于库的差异,Rasa 只需要比 LUIS 需要更多的训练数据。定性地,mitie
管道通常需要较少的训练数据,但代价是训练需要更多时间。
所以你的问题的基本答案是:如果你想使用ner_crf那么你需要增加你为实体识别提供的训练数据量。
也就是说:RideTime 是您唯一的实体吗?如果是这样,您应该考虑将 ner_duckling 添加到可以识别日期的管道中。这比您尝试自己训练日期的效果更好。
所以使用上面的训练数据和管道:
["nlp_spacy", "tokenizer_spacy", "intent_featurizer_spacy", "ner_crf", "ner_synonyms", "intent_classifier_sklearn", "ner_duckling"]
结果如下:
{
"entities": [
{
"additional_info": {
"grain": "hour",
"others": [
{
"grain": "hour",
"value": "2017-07-26T17:00:00.000Z"
}
],
"value": "2017-07-26T17:00:00.000Z"
},
"end": 27,
"entity": "time",
"extractor": "ner_duckling",
"start": 18,
"text": "today 5pm",
"value": "2017-07-26T17:00:00.000Z"
}
],
"intent": {
"confidence": 0.5469262356494486,
"name": "BookCab"
},
"intent_ranking": [
{
"confidence": 0.5469262356494486,
"name": "BookCab"
},
{
"confidence": 0.2812606328712321,
"name": "None"
},
{
"confidence": 0.08727531874740564,
"name": "ConfirmationNo"
},
{
"confidence": 0.0845378127319134,
"name": "ServiceRequestEnquiry"
}
],
"text": "I want to go ride today 5pm"
}
这个完整的训练集对我来说效果很好。这只是添加更多训练示例的问题。因此,当您进行更多测试时,如果您遇到一个无法按预期工作的示例,请将其添加到训练数据中并重新训练。从而教会您的模型处理更多不同的请求。
https://gist.github.com/wrathagom/7f05fbda75c785977bd07cd89e62ddd7