虚拟助手 -> LUIS、QnA、Dispatcher 最佳实践

Virtual Assistant -> LUIS, QnA, Dispatcher best practice

对于我们在使用 LUIS、QnA Maker 时遇到的某些问题，我有一些关于 "best practice" 的问题，特别是对于 Dispatcher：

1) 如果我们在 Dispatcher 中有超过 15k 的话语，是否有任何最佳实践？这看起来像是 LUIS 应用程序的局限性，但模型在长期运行中的可扩展性将值得怀疑。

2) Bing LUIS 的拼写检查会更改名字和姓氏，例如，如何避免这种情况？我想 Bing 当我们谈论 ChatBots 时拼写检查是必要的，因为拼写错误总是在门后，但将它用于名称是危险的。

3) 开箱即用不支持交叉验证，您可以使用自定义代码将数据拆分为折叠（不难），使用命令行在您的 k-1/k 上训练和发布您的模型folds，然后将 k-fold utterances 一个接一个地发送到 API。仅通过 UI https://cognitive.uservoice.com/forums/551524-language-understanding-luis/suggestions/20082157-add-api-to-batch-test-model and is limited to a test set of 1,000 utterances. If we use the one-by-one approach, we pay ,50 per 1k transactions https://azure.microsoft.com/de-de/pricing/details/cognitive-services/language-understanding-intelligent-services/ 支持批量上传，这意味着要获得 5 折的交叉验证指标，例如，我们可能需要为当前数据的单个实验支付大约 20 美元，更多如果我们添加更多数据。

4) 模型是一个黑匣子，如果需要，我们无法使用自定义功能。

我会尽力以最好的方式解决您的疑虑，如下所示：

1) 根据 LUIS 文档，

因此，您不能超过限制。在 Dispatch 应用程序的情况下，如果总话语超过 15k，则 dispatch 将对话语进行下采样以将其保持在 15k 以下。 CLI 有一个可选参数（--doAutoActiveLearning）来进行自动主动学习，它会智能地降低采样率（删除不相关的话语）。

--doAutoActiveLearning：（可选）默认为 false。 LUIS 对训练集大小的限制是 15000。当 LUIS 应用有更多的训练话语时，Dispatch 的自动主动学习过程可以智能地对话语进行采样。

2) Bing 拼写检查可帮助用户在 LUIS 预测话语的分数和实体之前纠正话语中拼写错误的单词。但是，如果您想避免使用 Bing 拼写检查 API 服务，那么 您将需要添加正确和不正确的拼写 这可以通过两种方式完成:

标记具有所有不同拼写的示例话语，以便 LUIS 可以学习正确的拼写和拼写错误。与使用拼写检查器相比，此选项需要更多的标记工作。
创建包含单词所有变体的短语列表。使用此解决方案，您无需在示例话语中标记单词变体。

3) 根据当前 documentation, a maximum of 1000 utterances are allowed per test. The data set is a JSON-formatted file containing a maximum of 1,000 labeled non-duplicate utterances. You can test up to 10 data sets in an app. If you need to test more, delete a data set and then add a new one. I would suggest you to report it as a feature request in the feedback forum.

希望对您有所帮助。

虚拟助手 -> LUIS、QnA、Dispatcher 最佳实践

Virtual Assistant -> LUIS, QnA, Dispatcher best practice

nlp

azure

botframework

azure-language-understanding

qnamaker