seaborn:选定的 KDE 带宽为 0。无法估计密度
seaborn: Selected KDE bandwidth is 0. Cannot estimate density
import pandas as pd
import seaborn as sns
ser_test = pd.Series([1,0,1,4,6,0,6,5,1,3,2,5,1])
sns.kdeplot(ser_test, cumulative=True)
以上代码生成如下CDF图:
但是当series的元素修改为:
ser_test = pd.Series([1,0,1,1,6,0,6,1,1,0,2,1,1])
sns.kdeplot(ser_test, cumulative=True)
我收到以下错误:
ValueError: could not convert string to float: 'scott'
RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.
这个错误是什么意思,我该如何解决它以生成 CDF(即使它非常倾斜)。
编辑: 我使用的是 seaborn 版本 0.9.0
完整的轨迹如下:
ValueError: could not convert string to float: 'scott'
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-93-7cee594b4526> in <module>
1 ser_test = pd.Series([1,0,1,1,6,0,6,1,1,0,2,1,1])
----> 2 sns.kdeplot(ser_test, cumulative=True)
~/.local/lib/python3.5/site-packages/seaborn/distributions.py in kdeplot(data, data2, shade, vertical, kernel, bw, gridsize, cut, clip, legend, cumulative, shade_lowest, cbar, cbar_ax, cbar_kws, ax, **kwargs)
689 ax = _univariate_kdeplot(data, shade, vertical, kernel, bw,
690 gridsize, cut, clip, legend, ax,
--> 691 cumulative=cumulative, **kwargs)
692
693 return ax
~/.local/lib/python3.5/site-packages/seaborn/distributions.py in _univariate_kdeplot(data, shade, vertical, kernel, bw, gridsize, cut, clip, legend, ax, cumulative, **kwargs)
281 x, y = _statsmodels_univariate_kde(data, kernel, bw,
282 gridsize, cut, clip,
--> 283 cumulative=cumulative)
284 else:
285 # Fall back to scipy if missing statsmodels
~/.local/lib/python3.5/site-packages/seaborn/distributions.py in _statsmodels_univariate_kde(data, kernel, bw, gridsize, cut, clip, cumulative)
353 fft = kernel == "gau"
354 kde = smnp.KDEUnivariate(data)
--> 355 kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip)
356 if cumulative:
357 grid, y = kde.support, kde.cdf
~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/kde.py in fit(self, kernel, bw, fft, weights, gridsize, adjust, cut, clip)
138 density, grid, bw = kdensityfft(endog, kernel=kernel, bw=bw,
139 adjust=adjust, weights=weights, gridsize=gridsize,
--> 140 clip=clip, cut=cut)
141 else:
142 density, grid, bw = kdensity(endog, kernel=kernel, bw=bw,
~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid)
451 bw = float(bw)
452 except:
--> 453 bw = bandwidths.select_bandwidth(X, bw, kern) # will cross-val fit this pattern?
454 bw *= adjust
455
~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/bandwidths.py in select_bandwidth(x, bw, kernel)
172 # eventually this can fall back on another selection criterion.
173 err = "Selected KDE bandwidth is 0. Cannot estimate density."
--> 174 raise RuntimeError(err)
175 else:
176 return bandwidth
RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.
这里发生的事情是 Seaborn(或者更确切地说,它依赖于计算 KDE 的库 - scipy 或 statsmodels)没有设法找出 "bandwidth",scaling parameter used in the calculation。您可以手动传递它。我尝试了一些值,发现 1.5 给出了与您之前的比例相同的图表:
sns.kdeplot(ser_test, cumulative=True, bw=1.5)
另见 。如果没有,值得安装 statsmodels
。
pip uninstall statsmodels
解决了同样错误的类似问题。
如果您不想等待 seaborn git 更新以稳定版本发布,您可以尝试 the issue page 中的解决方案之一。特别是 henrymartin1 的建议是尝试在 try/catch 块(由 ahartikainen 建议)中手动传递一个小带宽,该块会获取此特定错误的文本(因此其他错误仍然会出现):
try:
sns.distplot(df)
except RuntimeError as re:
if str(re).startswith("Selected KDE bandwidth is 0. Cannot estimate density."):
sns.distplot(df, kde_kws={'bw': 0.1})
else:
raise re
这对我有用。
您可以尝试三种选择
首先:显示具有默认设置的 KDE 块
sns.distplot(ser_test, hist = False, rug = True, rug_kws = {'color' : 'r'})
第二个:KDE 窄带宽显示单个概率块
sns.distplot(ser_test, hist = False, rug = True, rug_kws = {'color' : 'r'}, kde_kws = {'bw' : 1})
第三:选择不同的三角核函数(块状)
sns.distplot(ser_test, hist = False, rug = True, rug_kws = {'color' : 'r'}, kde_kws = {'bw' : 1.5, 'kernel' : 'tri'})
问题的发生是因为statsmodels
。
无论如何,要解决从 0.10.0 开始的 seaborn 版本的问题,只需将 diag_kws={'bw': 1}
作为参数。
尝试找出带宽的最佳值。
import pandas as pd
import seaborn as sns
ser_test = pd.Series([1,0,1,4,6,0,6,5,1,3,2,5,1])
sns.kdeplot(ser_test, cumulative=True)
以上代码生成如下CDF图:
但是当series的元素修改为:
ser_test = pd.Series([1,0,1,1,6,0,6,1,1,0,2,1,1])
sns.kdeplot(ser_test, cumulative=True)
我收到以下错误:
ValueError: could not convert string to float: 'scott'
RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.
这个错误是什么意思,我该如何解决它以生成 CDF(即使它非常倾斜)。
编辑: 我使用的是 seaborn 版本 0.9.0
完整的轨迹如下:
ValueError: could not convert string to float: 'scott'
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-93-7cee594b4526> in <module>
1 ser_test = pd.Series([1,0,1,1,6,0,6,1,1,0,2,1,1])
----> 2 sns.kdeplot(ser_test, cumulative=True)
~/.local/lib/python3.5/site-packages/seaborn/distributions.py in kdeplot(data, data2, shade, vertical, kernel, bw, gridsize, cut, clip, legend, cumulative, shade_lowest, cbar, cbar_ax, cbar_kws, ax, **kwargs)
689 ax = _univariate_kdeplot(data, shade, vertical, kernel, bw,
690 gridsize, cut, clip, legend, ax,
--> 691 cumulative=cumulative, **kwargs)
692
693 return ax
~/.local/lib/python3.5/site-packages/seaborn/distributions.py in _univariate_kdeplot(data, shade, vertical, kernel, bw, gridsize, cut, clip, legend, ax, cumulative, **kwargs)
281 x, y = _statsmodels_univariate_kde(data, kernel, bw,
282 gridsize, cut, clip,
--> 283 cumulative=cumulative)
284 else:
285 # Fall back to scipy if missing statsmodels
~/.local/lib/python3.5/site-packages/seaborn/distributions.py in _statsmodels_univariate_kde(data, kernel, bw, gridsize, cut, clip, cumulative)
353 fft = kernel == "gau"
354 kde = smnp.KDEUnivariate(data)
--> 355 kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip)
356 if cumulative:
357 grid, y = kde.support, kde.cdf
~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/kde.py in fit(self, kernel, bw, fft, weights, gridsize, adjust, cut, clip)
138 density, grid, bw = kdensityfft(endog, kernel=kernel, bw=bw,
139 adjust=adjust, weights=weights, gridsize=gridsize,
--> 140 clip=clip, cut=cut)
141 else:
142 density, grid, bw = kdensity(endog, kernel=kernel, bw=bw,
~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid)
451 bw = float(bw)
452 except:
--> 453 bw = bandwidths.select_bandwidth(X, bw, kern) # will cross-val fit this pattern?
454 bw *= adjust
455
~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/bandwidths.py in select_bandwidth(x, bw, kernel)
172 # eventually this can fall back on another selection criterion.
173 err = "Selected KDE bandwidth is 0. Cannot estimate density."
--> 174 raise RuntimeError(err)
175 else:
176 return bandwidth
RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.
这里发生的事情是 Seaborn(或者更确切地说,它依赖于计算 KDE 的库 - scipy 或 statsmodels)没有设法找出 "bandwidth",scaling parameter used in the calculation。您可以手动传递它。我尝试了一些值,发现 1.5 给出了与您之前的比例相同的图表:
sns.kdeplot(ser_test, cumulative=True, bw=1.5)
另见 statsmodels
。
pip uninstall statsmodels
解决了同样错误的类似问题。
如果您不想等待 seaborn git 更新以稳定版本发布,您可以尝试 the issue page 中的解决方案之一。特别是 henrymartin1 的建议是尝试在 try/catch 块(由 ahartikainen 建议)中手动传递一个小带宽,该块会获取此特定错误的文本(因此其他错误仍然会出现):
try:
sns.distplot(df)
except RuntimeError as re:
if str(re).startswith("Selected KDE bandwidth is 0. Cannot estimate density."):
sns.distplot(df, kde_kws={'bw': 0.1})
else:
raise re
这对我有用。
您可以尝试三种选择
首先:显示具有默认设置的 KDE 块
sns.distplot(ser_test, hist = False, rug = True, rug_kws = {'color' : 'r'})
第二个:KDE 窄带宽显示单个概率块
sns.distplot(ser_test, hist = False, rug = True, rug_kws = {'color' : 'r'}, kde_kws = {'bw' : 1})
第三:选择不同的三角核函数(块状)
sns.distplot(ser_test, hist = False, rug = True, rug_kws = {'color' : 'r'}, kde_kws = {'bw' : 1.5, 'kernel' : 'tri'})
问题的发生是因为statsmodels
。
无论如何,要解决从 0.10.0 开始的 seaborn 版本的问题,只需将 diag_kws={'bw': 1}
作为参数。
尝试找出带宽的最佳值。