scipy.interpolate.make_interp_spline 给出“x 和 y 不兼容”错误

Question

我正在尝试创建平滑的频率分布图。该代码适用于某个数据集，但为另一个数据集提供以下错误消息：

spl1 = make_interp_spline(bins1, data1['Frequency'].values)

File "/<path_to_anaconda3>/envs/mlpy37/lib/python3.7/site-packages/scipy/interpolate/_bsplines.py", line 805, in make_interp_spline
    raise ValueError('x and y are incompatible.')
ValueError: x and y are incompatible.

以下是数据集运行良好的代码：

import math
import numpy as np
import pandas as pd
import statistics
from scipy.stats import skew
from matplotlib import pyplot as plt
from scipy.interpolate import make_interp_spline

raw_data1 = [212, 869, 220, 654, 11, 624, 420, 121, 428, 865, 799, 405, 230, 670, 870, 366, 99, 55, 489, 312, 493, 163, 221, 84, 144, 48, 375, 86, 168, 100]
min_value1 = min(raw_data1)
max_value1 = max(raw_data1)
step1 = math.ceil((max_value1 - min_value1) / 10)
bin_edges1 = [i for i in range(min_value1 - 1, max_value1 + 1, step1)]
bins1 = [i for i in range(min_value1, max_value1 + 1, step1)]
if max(bin_edges1) < max_value1:
    bin_edges1.append(max(bin_edges1) + step1)
    bins1.append(max(bins1) + step1)
data1 = pd.DataFrame({'Frequency': pd.cut(raw_data1, bin_edges1).value_counts()})
x1 = np.linspace(min(bins1), max(bins1), 250)
spl1 = make_interp_spline(bins1, data1['Frequency'].values)
smooth_curve1 = spl1(x1)

print(data1)
mean1 = statistics.mean(raw_data1)
median1 = statistics.median(raw_data1)
print('Mean: {:.2f}'.format(mean1))
print('Median: {:.2f}'.format(median1))
try:
    print('Mode: {:.2f}'.format(statistics.mode(raw_data1)))
except Exception as e:
    print(e)
skewness1 = skew(raw_data1)
if mean1 > median1:
    print('Positive Skewness: ' + str(skewness1))
elif mean1 < median1:
    print('Negative Skewness: ' + str(skewness1))
else:
    print('No skewness: ' + str(skewness1))

plt.figure()

plt.subplot(111)
plt.plot(x1, smooth_curve1)
plt.title('Numerical Variables Exercise Skewness')
plt.xlabel('Data')
plt.ylabel('Frequency')

plt.show()

如果我用以下数据集替换上面的代码，它不起作用：

raw_data1 = [586, 760, 495, 678, 559, 415, 370, 659, 119, 288, 241, 787, 522, 207, 160, 526, 656, 848, 720, 676, 581, 929, 653, 661, 770, 800, 529, 975, 995, 947]

我收到的完整错误消息是：

Traceback (most recent call last):
  File "/<path_to_file>/NumericalVariablesExercise_Skewness.py", line 20, in <module>
    spl1 = make_interp_spline(bins1, data1['Frequency'].values)
  File "/<path_to_anaconda3>/envs/mlpy37/lib/python3.7/site-packages/scipy/interpolate/_bsplines.py", line 805, in make_interp_spline
    raise ValueError('x and y are incompatible.')
ValueError: x and y are incompatible.

有人可以协助识别我的代码或逻辑中的错误吗？

Answer 1

注释掉一行实际上解决了问题（或者至少它运行了，我无法验证输出）。错误信息很有用：x 和 y 的长度应该相同。

if max(bin_edges1) < max_value1:
    bin_edges1.append(max(bin_edges1) + step1)
    # bins1.append(max(bins1) + step1) <-- this one

此外，您的代码很难理解，因为您混淆了工具。您将 raw_data1 定义为 python 列表，并将 bins1 定义为列表理解。

raw_data1 = [212, 869, 220, 654, 11, 624, 420, 121, 428, 865, 799, 405, 230, 670, 870, 366, 99, 55, 489, 312, 493, 163, 221, 84, 144, 48, 375, 86, 168, 100]
..
bins1 = [i for i in range(min_value1, max_value1 + 1, step1)]

然后你使用 numpy.linspace 来表示 x1。

x1 = np.linspace(min(bins1), max(bins1), 250)

还涉及pandas:

data1 = pd.DataFrame({'Frequency': pd.cut(raw_data1, bin_edges1).value_counts()})

我建议主要使用一个工具，只有在必要时才使用其他工具。

scipy.interpolate.make_interp_spline 给出“x 和 y 不兼容”错误

scipy.interpolate.make_interp_spline gives “x and y are incompatible” error

python

statistics

numpy

spline

scipy