无监督朴素贝叶斯 - 它是如何工作的?

Unsupervised Naive Bayes - how does it work?

据我了解,要实现无监督的朴素贝叶斯,我们为每个实例的每个 class 分配随机概率,然后通过普通的朴素贝叶斯算法 运行 它。我知道,通过每次迭代,随机估计会变得更好,但我终究无法弄清楚它是如何工作的。

有人想对此事有所了解吗?

我看到的朴素贝叶斯在无监督学习中的变体,基本上是高斯混合模型(GMM,也称为期望最大化 或 EM) 以确定数据中的聚类。

在此设置中,假设数据可以class化,但class是隐藏的。问题是通过拟合每个 class 的高斯分布来确定 最可能的 classes。朴素贝叶斯假设定义了要使用的特定概率模型,其中属性在给定 class.

条件下是条件独立的

来自“无监督朴素贝叶斯,用于混合数据聚类 截断指数 Jose A. Gamez 的论文:

From the previous setting, probabilistic model-based clustering is modeled as a mixture of models (see e.g. (Duda et al., 2001)), where the states of the hidden class variable correspond to the components of the mixture (the number of clusters), and the multinomial distribution is used to model discrete variables while the Gaussian distribution is used to model numeric variables. In this way we move to a problem of learning from unlabeled data and usually the EM algorithm (Dempster et al., 1977) is used to carry out the learning task when the graphical structure is fixed and structural EM (Friedman, 1998) when the graphical structure also has to be discovered (Pena et al., 2000). In this paper we focus on the simplest model with fixed structure, the so-called Naive Bayes structure (fig. 1) where the class is the only root variable and all the attributes are conditionally independent given the class.

另请参阅 CV.SE 上的 this discussion