使用 R 中的文本预测连续变量

Question

我有一项任务，其中我需要根据客户面临的问题的文本字段预测连续变量里程表读数。该字段不是下拉菜单，而是使用客户的逐字更新。所以我需要根据客户遇到问题的文本字段来预测里程表读数。例如：

**Text**                     **Odometer Reading**
Clutch problem               20,000 
Axle Issue                   150,000

编辑：

我正在使用 unigram 构建线性模型。但是我在执行数据预处理时收到此警告：

> corp <- Corpus(VectorSource(ISSUES$CUSTOMER_VOICE))
> 
> corp <- tm_map(corp,tolower)
Warning message:
In tm_map.SimpleCorpus(corp, tolower) : transformation drops documents
> corp <- tm_map(corp,removePunctuation)
Warning message:
In tm_map.SimpleCorpus(corp, removePunctuation) :
transformation drops documents
> corp <- tm_map(corp,removeWords,stopwords('english'))
Warning message:
In tm_map.SimpleCorpus(corp, removeWords, stopwords("english")) :
transformation drops documents
> corp <- tm_map(corp,stemDocument)
Warning message:
In tm_map.SimpleCorpus(corp, stemDocument) : transformation drops documents

有人可以告诉我如何修复此警告。

Answer 1

这只是一种方法但这可能不是最佳解决方案对于 Text 列做 textminig 得到 unigrams 和 bigrams，然后将它们转换为 DTM 矩阵，然后使用任何线性模型预测 Odometer Reading

希望这可以解决您的问题

使用 R 中的文本预测连续变量

Predicting continuous variable using text in R

r

text-analytics-api