如何在 sklearn GMM 中获得每次迭代的对数似然?
How to get log-likelihood for each iteration in sklearn GMM?
我正在尝试在 sklearn 中拟合 GMM,我发现该模型在第 3 个时期左右收敛,但我似乎无法访问在每个时期计算的对数似然分数。
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=4, tol=1e-8).fit(data)
有没有办法以某种方式访问每个时期的对数似然分数?
如果您只想查看 loglik 分数,您可以设置 verbose=2
打印 loglik 中的变化,并设置 verbose_interval=1
在每一步捕获它:
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=3, tol=1e-8,verbose=2,verbose_interval=1)
gmm.fit(data)
Initialization 0
Iteration 1 time lapse 0.00560s ll change inf
Iteration 2 time lapse 0.00134s ll change 0.03655
Iteration 3 time lapse 0.00119s ll change 0.00867
Iteration 4 time lapse 0.00118s ll change 0.00619
Iteration 5 time lapse 0.00116s ll change 0.00612
Iteration 6 time lapse 0.00125s ll change 0.00647
Iteration 7 time lapse 0.00128s ll change 0.00700
Iteration 8 time lapse 0.00127s ll change 0.00727
Iteration 9 time lapse 0.00126s ll change 0.00673
Iteration 10 time lapse 0.00117s ll change 0.00604
Iteration 11 time lapse 0.00109s ll change 0.00530
Iteration 12 time lapse 0.00125s ll change 0.00431
Iteration 13 time lapse 0.00121s ll change 0.00366
Iteration 14 time lapse 0.00123s ll change 0.00404
Iteration 15 time lapse 0.00130s ll change 0.00361
Iteration 16 time lapse 0.00118s ll change 0.00157
Iteration 17 time lapse 0.00124s ll change 0.00048
Iteration 18 time lapse 0.00126s ll change 0.00015
Iteration 19 time lapse 0.00115s ll change 0.00005
Iteration 20 time lapse 0.00116s ll change 0.00001
Iteration 21 time lapse 0.00124s ll change 0.00000
Iteration 22 time lapse 0.00122s ll change 0.00000
Iteration 23 time lapse 0.00142s ll change 0.00000
Iteration 24 time lapse 0.00126s ll change 0.00000
Iteration 25 time lapse 0.00124s ll change 0.00000
Iteration 26 time lapse 0.00122s ll change 0.00000
Iteration 27 time lapse 0.00120s ll change 0.00000
Initialization converged: True time lapse 0.03765s ll -1.20124
要实际捕获此值,具体取决于您使用的是什么,您可以使用 logging
将其写入日志,或者例如在下面的 jupyter 笔记本中,这可能有效:
%%capture cap --no-stderr
gmm.fit(data)
然后我们将其传递到一个数据框中并尝试反向计算可能性:
res = pd.DataFrame([i.split() for i in cap.stdout.split("\n")]).iloc[:,[1,7]]
res.columns = ['iteration','change']
res.change = res.change.astype('float64')
res = res[np.isfinite(res.change)]
res['logLik'] = res['change'].values[-1]
res.loc[:len(res),['logLik']] = -res.change[:-1][::-1].cumsum()[::-1] + res.change.values[-1]
res
iteration change logLik
2 2 0.03655 -1.31546
3 3 0.00867 -1.27891
4 4 0.00619 -1.27024
5 5 0.00612 -1.26405
6 6 0.00647 -1.25793
7 7 0.00700 -1.25146
8 8 0.00727 -1.24446
9 9 0.00673 -1.23719
10 10 0.00604 -1.23046
11 11 0.00530 -1.22442
12 12 0.00431 -1.21912
13 13 0.00366 -1.21481
14 14 0.00404 -1.21115
15 15 0.00361 -1.20711
16 16 0.00157 -1.20350
17 17 0.00048 -1.20193
18 18 0.00015 -1.20145
19 19 0.00005 -1.20130
20 20 0.00001 -1.20125
21 21 0.00000 -1.20124
22 22 0.00000 -1.20124
23 23 0.00000 -1.20124
24 24 0.00000 -1.20124
25 25 0.00000 -1.20124
26 26 0.00000 -1.20124
27 27 0.00000 -1.20124
28 converged: -1.20124 -1.20124
我正在尝试在 sklearn 中拟合 GMM,我发现该模型在第 3 个时期左右收敛,但我似乎无法访问在每个时期计算的对数似然分数。
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=4, tol=1e-8).fit(data)
有没有办法以某种方式访问每个时期的对数似然分数?
如果您只想查看 loglik 分数,您可以设置 verbose=2
打印 loglik 中的变化,并设置 verbose_interval=1
在每一步捕获它:
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=3, tol=1e-8,verbose=2,verbose_interval=1)
gmm.fit(data)
Initialization 0
Iteration 1 time lapse 0.00560s ll change inf
Iteration 2 time lapse 0.00134s ll change 0.03655
Iteration 3 time lapse 0.00119s ll change 0.00867
Iteration 4 time lapse 0.00118s ll change 0.00619
Iteration 5 time lapse 0.00116s ll change 0.00612
Iteration 6 time lapse 0.00125s ll change 0.00647
Iteration 7 time lapse 0.00128s ll change 0.00700
Iteration 8 time lapse 0.00127s ll change 0.00727
Iteration 9 time lapse 0.00126s ll change 0.00673
Iteration 10 time lapse 0.00117s ll change 0.00604
Iteration 11 time lapse 0.00109s ll change 0.00530
Iteration 12 time lapse 0.00125s ll change 0.00431
Iteration 13 time lapse 0.00121s ll change 0.00366
Iteration 14 time lapse 0.00123s ll change 0.00404
Iteration 15 time lapse 0.00130s ll change 0.00361
Iteration 16 time lapse 0.00118s ll change 0.00157
Iteration 17 time lapse 0.00124s ll change 0.00048
Iteration 18 time lapse 0.00126s ll change 0.00015
Iteration 19 time lapse 0.00115s ll change 0.00005
Iteration 20 time lapse 0.00116s ll change 0.00001
Iteration 21 time lapse 0.00124s ll change 0.00000
Iteration 22 time lapse 0.00122s ll change 0.00000
Iteration 23 time lapse 0.00142s ll change 0.00000
Iteration 24 time lapse 0.00126s ll change 0.00000
Iteration 25 time lapse 0.00124s ll change 0.00000
Iteration 26 time lapse 0.00122s ll change 0.00000
Iteration 27 time lapse 0.00120s ll change 0.00000
Initialization converged: True time lapse 0.03765s ll -1.20124
要实际捕获此值,具体取决于您使用的是什么,您可以使用 logging
将其写入日志,或者例如在下面的 jupyter 笔记本中,这可能有效:
%%capture cap --no-stderr
gmm.fit(data)
然后我们将其传递到一个数据框中并尝试反向计算可能性:
res = pd.DataFrame([i.split() for i in cap.stdout.split("\n")]).iloc[:,[1,7]]
res.columns = ['iteration','change']
res.change = res.change.astype('float64')
res = res[np.isfinite(res.change)]
res['logLik'] = res['change'].values[-1]
res.loc[:len(res),['logLik']] = -res.change[:-1][::-1].cumsum()[::-1] + res.change.values[-1]
res
iteration change logLik
2 2 0.03655 -1.31546
3 3 0.00867 -1.27891
4 4 0.00619 -1.27024
5 5 0.00612 -1.26405
6 6 0.00647 -1.25793
7 7 0.00700 -1.25146
8 8 0.00727 -1.24446
9 9 0.00673 -1.23719
10 10 0.00604 -1.23046
11 11 0.00530 -1.22442
12 12 0.00431 -1.21912
13 13 0.00366 -1.21481
14 14 0.00404 -1.21115
15 15 0.00361 -1.20711
16 16 0.00157 -1.20350
17 17 0.00048 -1.20193
18 18 0.00015 -1.20145
19 19 0.00005 -1.20130
20 20 0.00001 -1.20125
21 21 0.00000 -1.20124
22 22 0.00000 -1.20124
23 23 0.00000 -1.20124
24 24 0.00000 -1.20124
25 25 0.00000 -1.20124
26 26 0.00000 -1.20124
27 27 0.00000 -1.20124
28 converged: -1.20124 -1.20124