为什么我们选择 Beta 分布作为先验假设？

Why do we choose Beta distribution as a prior on hypothesis?

我看过 CMU Tom Mitchell 2011 年 10-701 年课程的机器学习 class 视频。当他在 theta 上使用 Beta 分布作为先验时，他正在教授最大似然估计主题，我想知道他只选择了那个吗？

在 this lecture, prof Mitchell gives an example of coin flipping and estimating its fairness, i.e. the probability of heads - theta. He reasonably chose a binomial distribution 中进行此实验。

选择的原因beta distribution for prior is to simplify the math when computing the posterior. This works well, because beta is a conjugate prior for binomial - 在同一堂课的最后，教授提到了这一点。这并不意味着一个人不可能使用任何其他先验，例如normal, Poisson等。但其他先验导致后验分布复杂，难以优化，计算积分等

这是一个一般原则：在更复杂的分布之前更喜欢共轭，即使它不完全符合数据，因为数学更简单。

为什么我们选择 Beta 分布作为先验假设？

Why do we choose Beta distribution as a prior on hypothesis?

machine-learning

probability

bayesian

beta-distribution

mle