三次样条曲线拟合

Question

我正在尝试插入一个累积分布，例如i) 人数到 ii) 拥有的汽车数量，显示例如前 20% 的人拥有超过 20% 的汽车——当然 100% 的人拥有 100% 的汽车。我也知道有例如1 亿人和 2 亿辆汽车。

现在来看我的代码：

#import libraries (more than required here)
import pandas as pd
from scipy import interpolate
from scipy.interpolate import interp1d
from sympy import symbols, solve, Eq
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
%matplotlib inline
import plotly.express as px
from scipy import interpolate

curve=pd.read_excel('inputs.xlsx',sheet_name='inputdata')

输入数据：Curveplot（左边累计人（x）//右边累计车（y））

#Input data in list form (I am not sure how to interpolate from a list for the moment)
cumulatedpeople = [0, 0.453086, 0.772334, 0.950475, 0.978981, 0.999876, 0.999990, 1]
cumulatedcars= [0, 0.016356, 0.126713, 0.410482, 0.554976, 0.950073, 0.984913, 1]

x, y = points[:,0], points[:,1]
interpolation = interp1d(x, y, kind = 'cubic')

number_of_people_mn= 100000000

oneperson = 1 / number_of_people_mn
dataset = pd.DataFrame(range(number_of_people_mn + 1))
dataset.columns = ["nr_of_one_person"]
dataset.drop(dataset.index[:1], inplace=True)

#calculating the position of every single person on the cumulated x-axis (between 0 and 1)
dataset["cumulatedpeople"] = dataset["nr_of_one_person"] / number_of_people_mn

#finding the "cumulatedcars" to the "cumulatedpeople" via interpolation (between 0 and 1)
dataset["cumulatedcars"] = interpolation(dataset["cumulatedpeople"])

plt.plot(dataset["cumulatedpeople"], dataset["cumulatedcars"])
plt.legend(['Cubic interpolation'], loc = 'best')
plt.xlabel('Cumulated people')
plt.ylabel('Cumulated cars')
plt.title("People-to-car cumulated curve")
plt.show()

然而，当查看实际情节时，我得到以下错误结果：Cubic interpolation

事实上，该曲线看起来应该几乎类似于使用完全相同的输入数据进行线性插值的曲线 - 但这对于我的目的来说不够准确：Linear interpolation

我是否遗漏了任何相关步骤，或者从几乎看起来像线性插值的输入中获得准确插值的最佳方法是什么？

Answer 1

简答：您的代码做的是正确的，但数据不适合三次插值。

让我解释一下。这是您的代码，为了清晰起见我简化了它

from scipy.interpolate import interp1d
from matplotlib import pyplot as plt

cumulatedpeople = [0, 0.453086, 0.772334, 0.950475, 0.978981, 0.999876, 0.999990, 1]
cumulatedcars= [0, 0.016356, 0.126713, 0.410482, 0.554976, 0.950073, 0.984913, 1]
interpolation = interp1d(cumulatedpeople, cumulatedcars, kind = 'cubic')

number_of_people_mn= 100#000000
cumppl = np.arange(number_of_people_mn + 1)/number_of_people_mn
cumcars = interpolation(cumppl)
plt.plot(cumppl, cumcars)
plt.plot(cumulatedpeople, cumulatedcars,'o')
plt.show()

注意最后几行——我在同一张图上绘制了内插结果和输入日期。这是结果

橙色点是原始数据，蓝色线是三次插值。插值器通过所有的点，所以技术上是正确的

显然它没有按照您的意愿行事

这种奇怪行为的原因主要是在右端，您有几个 x-points 非常靠近 - 插值器会产生大量的摆动，试图适应非常靠近的点。

如果我从插值器中删除两个 right-most 个点：

interpolation = interp1d(cumulatedpeople[:-2], cumulatedcars[:-2], kind = 'cubic')

看起来比较合理：

但仍然有人认为线性插值更好。现在左边的摆动是因为初始 x-poonts 之间的差距太大

这里的道理是只有在 x 点之间的间隙大致相同时才真正使用三次插值

我认为你最好的选择是使用像 curve_fit

这样的东西

可以找到相关讨论here

如 here 所述，特别是单调插值可以在您的数据上产生良好的结果。在此处复制相关位，您可以将插值器替换为

from scipy.interpolate import pchip
interpolation = pchip(cumulatedpeople, cumulatedcars)

并获得 decent-looking 合身：

三次样条曲线拟合

Curve fitting with cubic spline

python

interpolation

curve-fitting

scipy