Pandas DataFrame 通过添加行进行不同增量的插值

Pandas DataFrame interpolation with different increment by adding rows

所以,基本上,我有一个如下所示的 DataFrame:

任务是使 'Depth' 增加 0.1 步(添加新行),并相应地插入“值”。

它应该是这样的: (由于尺寸原因裁剪了底部)

这是我写的代码草稿:

import pandas as pd
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
                   'Depth': [15, 16, 17, 10, 11, 12],
                   'Value': [100, 200, 300, 200, 300, 400]})

df['Depth']= ... #make it here with increment 0.1
df['Value'] = df['Value'].interpolate(method=linear)
df['Name'] = ... #copy it for each empty row

df.to_csv('Interpolated values.csv')

下面给出的解决方案将解决问题。

import pandas as pd
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
                   'Depth': [15, 16, 17, 10, 11, 12],
                   'Value':[100, 200, 300, 200, 300, 400]})

counter = 0.0
def add(val):
    global counter
    if counter <=0.9:
        val = val+counter
        counter+=0.1
        return val
    else:
        counter=0.1
        return val

# Duplicate each rows 10 times and sort using the index 
df = pd.concat([df]*10).sort_index()
# Apply add function on the depth
df['Depth'] = df['Depth'].apply(add)
# Reset the index
df= df.reset_index(drop=True)
# Increment the value by 10 based on the last value
for idx in range(1,len(df)):
    df.loc[idx, 'Value'] = df.loc[idx-1,'Value']+10

输出:

   Name  Depth  Value
0    AS   15.0    100
1    AS   15.1    110
2    AS   15.2    120
3    AS   15.3    130
4    AS   15.4    140
5    AS   15.5    150
6    AS   15.6    160
7    AS   15.7    170
8    AS   15.8    180
9    AS   15.9    190
10   AS   16.0    200
11   AS   16.1    210
12   AS   16.2    220
13   AS   16.3    230
14   AS   16.4    240

第 1 部分:

我选择不使用迭代,而是为整个列分配新值。

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'], 
                   'Depth': [15, 16, 17, 10, 11, 12], 
                   'Value':[100, 200, 300, 200, 300, 400]
                  })

输出:

  Name  Depth  Value
0   AS     15    100
1   AS     16    200
2   AS     17    300
3   DB     10    200
4   DB     11    300
5   DB     12    400

  1. 使用len获取列的长度。

  2. df[column][0]得到初始值。如果您确实有特定的初始值,则只需跳过此步骤。为其分配初始值。

ini_1 = df['Depth'][0] # initial value
ini_2 = df['Value'][0] # initial value
length = len(df)
step_1 = 0.1
step_2 = 10

df['Depth'] = np.arange(ini_1, ini_1+length*step_1, step_1)
df['Value'] = np.arange(ini_2, ini_2+length*step_2, step_2)

输出

  Name  Depth  Value
0   AS   15.0    100
1   AS   15.1    110
2   AS   15.2    120
3   DB   15.3    130
4   DB   15.4    140
5   DB   15.5    150

由于我们不知道 NameDepth 之间的变体规则,但这是避免迭代到每一行的另一个方面。


第 2 部分:

假设每个名称深度组扩展到 10 个项目 并分别在 DepthValue 上按照 0.1 和 10 的增量。

步骤如下:

  1. 加载数据帧
import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'], 
                   'Depth': [15, 16, 17, 10, 11, 12], 
                   'Value':[100, 200, 300, 200, 300, 400]
                  })
  1. df 扩展到 10 倍:
dfn = pd.concat([df]*10,ignore_index=False).sort_index()
  1. 这是等差数列:
    对于 Depth a = 0,d = 0.1,长度 = 10
    对于 Value a = 0,d = 10,长度 = 10
    将它们作为每个 Name-Depth 组中的向量(一维数组)求和:
a = 0
d_depth = 0.1
d_value = 10
length = 10
arithmetric_1 = [round(a + d_depth * (n - 1),2) for n in range(1, length + 1)] # arithmetic progression series for Depth
arithmetric_2 = [round(a + d_value * (n - 1),2) for n in range(1, length + 1)] # arithmetic progression series for Value
  1. 主要部分
for i in set(dfn.index):
    dfn.loc[i,'Depth'] = dfn.loc[i,'Depth'].array + arithmetric_1
    dfn.loc[i,'Value'] = dfn.loc[i,'Value'].array + arithmetric_2

总结: 现在你得到 dataframe dfn 作为基于假设的结果。这种操作试图减少循环时间,并使用向量方面来处理问题(如果你有巨大的数据集)。

Name Depth  Value
0   AS  15.0    100
0   AS  15.1    110
0   AS  15.2    120
0   AS  15.3    130
0   AS  15.4    140
0   AS  15.5    150
0   AS  15.6    160
0   AS  15.7    170
0   AS  15.8    180
0   AS  15.9    190
1   AS  16.0    200
1   AS  16.1    210
1   AS  16.2    220
1   AS  16.3    230
1   AS  16.4    240
1   AS  16.5    250
1   AS  16.6    260
1   AS  16.7    270
1   AS  16.8    280
1   AS  16.9    290
2   AS  17.0    300
2   AS  17.1    310
       :
       :

这是一个解决方案,它允许您使用步长的任何变化对值进行插值 (假设步长正好落在整数之间) 并且插值更加灵活:

my_df_list = []
step = 0.1

for label, group in df.sort_values('Depth').groupby('Name'):

    # Create a lookup dictionary for interpolation lookup
    lookup_dict = {x[0]:x[1] for x in group[['Depth', 'Value']].values}
    
    # Use np.linespace because of the strictness of start and end values
    new_index = np.linspace(
        start = group['Depth'].min(),
        stop = group['Depth'].max(),
        num = int(1/step) * np.ptp(group['Depth']) + 1
    )
    new_values = pd.Series(
        lookup_dict.get(round(x, 1)) for x in new_index
    ).interpolate()

    # Create a tmp df with your values
    df_tmp = pd.DataFrame.from_dict({
        'Name': [label] * len(new_index),
        'Depth': new_index, 
        'Value':new_values
    })
    my_df_list.append(df_tmp)

# Finally, combine all dfs
df_final = pd.concat(my_df_list, ignore_index=True)
    Name    Depth   Value
0   AS      15.0    100.0
1   AS      15.1    110.0
...
19  AS      16.9    290.0
20  AS      17.0    300.0
21  DB      10.0    200.0
22  DB      10.1    210.0
...
39  DB      11.8    380.0
40  DB      11.9    390.0
41  DB      12.0    400.0