Pandas 插值方法定义
Pandas interpolation method definitions
在 pandas documentation 中,提供了许多方法作为 pandas.DataFrame.interpolate
的参数,包括
nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes
然而,scipy documentation表示以下选项:
kind str or int, optional
Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.
文档似乎有误,因为 scipy.interpolate.interp1d
不接受 barycentric
或 polynomial
作为有效方法。我想 barycentric
指的是 scipy.interpolate.barycentric_interpolate
,但是 polynomial
指的是什么?我认为它可能等同于 piecewise_polynomial
选项,但两者给出不同的结果。
此外,method=cubicspline
和 method=spline, order=3
给出了不同的结果。这里有什么区别?
pandas
插值方法是来自 numpy
和 scipy
库中不同位置的插值方法的合并。
目前所有代码都位于pandas/core/missing.py
.
在高层次上,它 splits the interpolation 将方法转化为由 np.iterp
处理的方法和由整个 scipy
库处理的其他方法。
# interpolation methods that dispatch to np.interp
NP_METHODS = ["linear", "time", "index", "values"]
# interpolation methods that dispatch to _interpolate_scipy_wrapper
SP_METHODS = ["nearest", "zero", "slinear", "quadratic", "cubic",
"barycentric", "krogh", "spline", "polynomial",
"from_derivatives", "piecewise_polynomial", "pchip",
"akima", "cubicspline"]
然后因为 scipy
方法被拆分成不同的方法,您可以看到 missing.py
中有大量其他包装器指示 scipy 方法。大多数方法都传递给 scipy.interpolate.interp1d
;然而,对于其他一些人来说,有一个 dict 或其他指向那些特定 scipy
方法的包装器方法。
from scipy import interpolate
alt_methods = {
"barycentric": interpolate.barycentric_interpolate,
"krogh": interpolate.krogh_interpolate,
"from_derivatives": _from_derivatives,
"piecewise_polynomial": _from_derivatives,
}
其中 _from_derivatives
在 missing.py
中的文档字符串表示:
def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False):
"""
Convenience function for interpolate.BPoly.from_derivatives.
...
"""
所以 TLDR,根据您指定的方法,您最终会直接使用以下方法之一:
numpy.interp
scipy.interpolate.interp1d
scipy.interpolate.barycentric_interpolate
scipy.interpolate.krogh_interpolate
scipy.interpolate.BPoly.from_derivatives
scipy.interpolate.Akima1DInterpolator
scipy.interpolate.UnivariateSpline
scipy.interpolate.CubicSpline
在 pandas documentation 中,提供了许多方法作为 pandas.DataFrame.interpolate
的参数,包括
nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes
然而,scipy documentation表示以下选项:
kind str or int, optional Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.
文档似乎有误,因为 scipy.interpolate.interp1d
不接受 barycentric
或 polynomial
作为有效方法。我想 barycentric
指的是 scipy.interpolate.barycentric_interpolate
,但是 polynomial
指的是什么?我认为它可能等同于 piecewise_polynomial
选项,但两者给出不同的结果。
此外,method=cubicspline
和 method=spline, order=3
给出了不同的结果。这里有什么区别?
pandas
插值方法是来自 numpy
和 scipy
库中不同位置的插值方法的合并。
目前所有代码都位于pandas/core/missing.py
.
在高层次上,它 splits the interpolation 将方法转化为由 np.iterp
处理的方法和由整个 scipy
库处理的其他方法。
# interpolation methods that dispatch to np.interp
NP_METHODS = ["linear", "time", "index", "values"]
# interpolation methods that dispatch to _interpolate_scipy_wrapper
SP_METHODS = ["nearest", "zero", "slinear", "quadratic", "cubic",
"barycentric", "krogh", "spline", "polynomial",
"from_derivatives", "piecewise_polynomial", "pchip",
"akima", "cubicspline"]
然后因为 scipy
方法被拆分成不同的方法,您可以看到 missing.py
中有大量其他包装器指示 scipy 方法。大多数方法都传递给 scipy.interpolate.interp1d
;然而,对于其他一些人来说,有一个 dict 或其他指向那些特定 scipy
方法的包装器方法。
from scipy import interpolate
alt_methods = {
"barycentric": interpolate.barycentric_interpolate,
"krogh": interpolate.krogh_interpolate,
"from_derivatives": _from_derivatives,
"piecewise_polynomial": _from_derivatives,
}
其中 _from_derivatives
在 missing.py
中的文档字符串表示:
def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False):
"""
Convenience function for interpolate.BPoly.from_derivatives.
...
"""
所以 TLDR,根据您指定的方法,您最终会直接使用以下方法之一:
numpy.interp
scipy.interpolate.interp1d
scipy.interpolate.barycentric_interpolate
scipy.interpolate.krogh_interpolate
scipy.interpolate.BPoly.from_derivatives
scipy.interpolate.Akima1DInterpolator
scipy.interpolate.UnivariateSpline
scipy.interpolate.CubicSpline