机器学习根据文本字段预测文本字段

Question

我从事机器学习和预测工作大约一个月。我尝试过 IBM watson with bluemix、亚马逊机器学习和 predictionIO。我想做的是根据其他字段预测一个文本字段。我的 csv 文件有四个名为 Question,Summary,Description,Answer 的 text fields 和大约 4500 个 lines/Recrods。上传的数据集中没有数字字段。典型的记录如下所示。

{'Question':'sys down','Summary':'does not boot after OS update','Description':'Desktop does not boot','Answer':'Switch to safemode and rollback last update'}

我在 IBM watson 上找到了一个 question in their forums and a reply that custom corpus upload is not possible right now. Then I moved to Amazon machine learning. I followed their documentation and was able to implement prediction in a custom app using api. I tested on movielens data and everything was numerical. I successfully uploaded data and got movie recommendations with their python-boto 库。当我尝试上传我的 csv 文件时，我遇到的问题是 no text field can be selected as target。然后我添加对应于 csv.This approcah 中每个值的数值使预测成功但准确性不正确。可能是 csv 必须以更好的方式格式化。

movielens数据中的一条记录粘贴在下面。它说 userID 196 在时间（unix 时间戳）881250949 给 movieID 242 一个两星评级。

196 242 3   881250949

目前我正在尝试 predictionIO. A test on movielens database was run successfully without issues as told in the documentation 使用推荐模板。但是仍然不清楚基于其他文本字段预测文本字段的可能性。

只能预测运行数字字段还是可以基于其他文本字段预测文本字段？

Answer 1

不，预测不仅运行在数值字段上。它可以是任何东西，包括文本。我的猜测是 MovieLens 数据使用 ID 而不是实际的用户名和电影名，因为

这节省了存储空间space（这个数据集已经存在很长时间了，那时存储空间绝对是一个问题），并且
无需知道真实用户名（隐私问题）

对于您的情况，您可能需要查看文本分类模板 https://docs.prediction.io/demo/textclassification/。您需要对每条记录的分类方式进行建模。

机器学习根据文本字段预测文本字段

Machine learning predict text fields based on text fields

amazon

machine-learning

prediction

ibm-watson

predictionio