Pandas 插值方法定义

Pandas interpolation method definitions

pandas documentation 中,提供了许多方法作为 pandas.DataFrame.interpolate 的参数,包括

nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).

‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes

然而,scipy documentation表示以下选项:

kind str or int, optional Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.

文档似乎有误,因为 scipy.interpolate.interp1d 不接受 barycentricpolynomial 作为有效方法。我想 barycentric 指的是 scipy.interpolate.barycentric_interpolate,但是 polynomial 指的是什么?我认为它可能等同于 piecewise_polynomial 选项,但两者给出不同的结果。

此外,method=cubicsplinemethod=spline, order=3 给出了不同的结果。这里有什么区别?

pandas 插值方法是来自 numpyscipy 库中不同位置的插值方法的合并。

目前所有代码都位于pandas/core/missing.py.

在高层次上,它 splits the interpolation 将方法转化为由 np.iterp 处理的方法和由整个 scipy 库处理的其他方法。

# interpolation methods that dispatch to np.interp
NP_METHODS = ["linear", "time", "index", "values"]

# interpolation methods that dispatch to _interpolate_scipy_wrapper
SP_METHODS = ["nearest", "zero", "slinear", "quadratic", "cubic",
              "barycentric", "krogh", "spline", "polynomial",
              "from_derivatives", "piecewise_polynomial", "pchip",
              "akima", "cubicspline"]

然后因为 scipy 方法被拆分成不同的方法,您可以看到 missing.py 中有大量其他包装器指示 scipy 方法。大多数方法都传递给 scipy.interpolate.interp1d;然而,对于其他一些人来说,有一个 dict 或其他指向那些特定 scipy 方法的包装器方法。

from scipy import interpolate

alt_methods = {
    "barycentric": interpolate.barycentric_interpolate,
    "krogh": interpolate.krogh_interpolate,
    "from_derivatives": _from_derivatives,
    "piecewise_polynomial": _from_derivatives,
}

其中 _from_derivativesmissing.py 中的文档字符串表示:

def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False):
    """
    Convenience function for interpolate.BPoly.from_derivatives.
    ...
    """

所以 TLDR,根据您指定的方法,您最终会直接使用以下方法之一:

numpy.interp
scipy.interpolate.interp1d
scipy.interpolate.barycentric_interpolate
scipy.interpolate.krogh_interpolate
scipy.interpolate.BPoly.from_derivatives
scipy.interpolate.Akima1DInterpolator
scipy.interpolate.UnivariateSpline
scipy.interpolate.CubicSpline