使用循环对多个数据集进行线性一维插值

Question

我有兴趣使用 scipy.interpolate 库执行线性插值。数据集看起来有点像这样： DATAFRAME for interpolation between X, Y for different RUNs

我想使用这个插值函数从这个数据集中找到缺失的 Y： DATAFRAME to use the interpolation function

此处给出的运行的数量仅为 3，但我运行正在处理一个数据集，该数据集将运行变成 1000 个运行。因此，如果您能建议如何使用迭代函数进行插值，我将不胜感激？

from scipy.interpolate import interp1d
for RUNNumber in range(TotalRuns)
 InterpolatedFunction[RUNNumber]=interp1d(X, Y)

Answer 1

据我了解，您需要为每个运行定义一个单独的插值函数。然后你想将这些功能应用于第二个数据框。我定义了一个包含列 ['X', 'Y', 'RUN'] 的数据框 df 和第二个包含列 ['X', 'Y_interpolation', 'RUN'] 的数据框 new_df。

interpolating_functions = dict()
for run_number in range(1, max_runs):
    run_data = df[df['RUN']==run_number][['X', 'Y']]
    interpolating_functions[run_number] = interp1d(run_data['X'], run_data['Y'])

现在我们有每个运行的插值函数，我们可以使用它们来填充新数据框中的 'Y_interpolation' 列。这可以使用 apply 函数来完成，该函数接受一个函数并将其应用于数据框中的每一行。因此，让我们定义一个插值函数，它将获取这个新 df 的一行并使用 X 值和运行数字来计算插值 Y 值。

def interpolate(row):
    int_func = interpolating_functions[row['RUN']]
    interp_y = int_func._call_linear([row['X'])[0] #the _call_linear method
                                                   #expects and returns an array
    return interp_y[0]

现在我们只使用 apply 和我们定义的 interpolate 函数。

new_df['Y_interpolation'] = new_df.apply(interpolate,axis=1)

我正在使用 pandas 版本 0.20.3，这给了我一个看起来像这样的 new_df：

使用循环对多个数据集进行线性一维插值

Linear 1D interpolation on multiple datasets using loops

scipy

linear-interpolation

python-2.7

pandas