Python 中的机器学习朴素贝叶斯分类器

Machine Learning Naive Bayes Classifier in Python

我一直在试验机器学习,需要开发一个模型来根据多个变量进行预测。我能解释这一点的最简单方法是通过下面的 "play golf" 示例:

train.csv

Outlook,Temperature,Humidity,Windy,Play
overcast,hot,high,FALSE,yes
overcast,cool,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
rainy,mild,normal,FALSE,yes
rainy,mild,high,TRUE,no
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
sunny,mild,normal,TRUE,yes

程序需要将预测插入 makeprediciton.csv 文件

Outlook,Temperature,Humidity,Windy,Play
rainy,hot,normal,TRUE,

我已经能够使用 excel 应用这个分类器。想知道 python 中是否有一个简单的库可以帮助我对频率进行分组并进行计算,而不必为所有内容手动编写代码。

你可以通过下面的excel看到我的做法link: http://www.filedropper.com/playgolf

如有任何帮助,我们将不胜感激。

视情况而定。如果您不想编码,请尝试 Rapidminier. It is very simple to learn and experiment. It's documentation is very good and clear.You can see This example 朴素贝叶斯分类器并获得结果。


此外,如果您需要一些编码并使用 python 语言,请尝试 Scikit-learn witch is more advanced lib in python. It utilize scipy and numpy and has very powerful implementation of data mining algorithms. For your example you must first use One-Hot-Encoding to change your categorical feature to high dimension sparse vector and then use a classifier like Naive Bayesian


同样要读取 CSV 文件,您可以使用 Pandas