Pandas DataFrame 通过添加行进行不同增量的插值
Pandas DataFrame interpolation with different increment by adding rows
所以,基本上,我有一个如下所示的 DataFrame:
任务是使 'Depth' 增加 0.1 步(添加新行),并相应地插入“值”。
它应该是这样的:
(由于尺寸原因裁剪了底部)
这是我写的代码草稿:
import pandas as pd
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
'Depth': [15, 16, 17, 10, 11, 12],
'Value': [100, 200, 300, 200, 300, 400]})
df['Depth']= ... #make it here with increment 0.1
df['Value'] = df['Value'].interpolate(method=linear)
df['Name'] = ... #copy it for each empty row
df.to_csv('Interpolated values.csv')
下面给出的解决方案将解决问题。
import pandas as pd
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
'Depth': [15, 16, 17, 10, 11, 12],
'Value':[100, 200, 300, 200, 300, 400]})
counter = 0.0
def add(val):
global counter
if counter <=0.9:
val = val+counter
counter+=0.1
return val
else:
counter=0.1
return val
# Duplicate each rows 10 times and sort using the index
df = pd.concat([df]*10).sort_index()
# Apply add function on the depth
df['Depth'] = df['Depth'].apply(add)
# Reset the index
df= df.reset_index(drop=True)
# Increment the value by 10 based on the last value
for idx in range(1,len(df)):
df.loc[idx, 'Value'] = df.loc[idx-1,'Value']+10
输出:
Name Depth Value
0 AS 15.0 100
1 AS 15.1 110
2 AS 15.2 120
3 AS 15.3 130
4 AS 15.4 140
5 AS 15.5 150
6 AS 15.6 160
7 AS 15.7 170
8 AS 15.8 180
9 AS 15.9 190
10 AS 16.0 200
11 AS 16.1 210
12 AS 16.2 220
13 AS 16.3 230
14 AS 16.4 240
第 1 部分:
我选择不使用迭代,而是为整个列分配新值。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
'Depth': [15, 16, 17, 10, 11, 12],
'Value':[100, 200, 300, 200, 300, 400]
})
输出:
Name Depth Value
0 AS 15 100
1 AS 16 200
2 AS 17 300
3 DB 10 200
4 DB 11 300
5 DB 12 400
使用len
获取列的长度。
df[column][0]
得到初始值。如果您确实有特定的初始值,则只需跳过此步骤。为其分配初始值。
ini_1 = df['Depth'][0] # initial value
ini_2 = df['Value'][0] # initial value
length = len(df)
step_1 = 0.1
step_2 = 10
df['Depth'] = np.arange(ini_1, ini_1+length*step_1, step_1)
df['Value'] = np.arange(ini_2, ini_2+length*step_2, step_2)
输出
Name Depth Value
0 AS 15.0 100
1 AS 15.1 110
2 AS 15.2 120
3 DB 15.3 130
4 DB 15.4 140
5 DB 15.5 150
由于我们不知道 Name
和 Depth
之间的变体规则,但这是避免迭代到每一行的另一个方面。
第 2 部分:
假设每个名称深度组扩展到 10 个项目
并分别在 Depth
和 Value
上按照 0.1 和 10 的增量。
步骤如下:
- 加载数据帧
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
'Depth': [15, 16, 17, 10, 11, 12],
'Value':[100, 200, 300, 200, 300, 400]
})
- 将
df
扩展到 10 倍:
dfn = pd.concat([df]*10,ignore_index=False).sort_index()
- 这是等差数列:
对于 Depth
a = 0,d = 0.1,长度 = 10
对于 Value
a = 0,d = 10,长度 = 10
将它们作为每个 Name-Depth
组中的向量(一维数组)求和:
a = 0
d_depth = 0.1
d_value = 10
length = 10
arithmetric_1 = [round(a + d_depth * (n - 1),2) for n in range(1, length + 1)] # arithmetic progression series for Depth
arithmetric_2 = [round(a + d_value * (n - 1),2) for n in range(1, length + 1)] # arithmetic progression series for Value
- 主要部分
for i in set(dfn.index):
dfn.loc[i,'Depth'] = dfn.loc[i,'Depth'].array + arithmetric_1
dfn.loc[i,'Value'] = dfn.loc[i,'Value'].array + arithmetric_2
总结:
现在你得到 dataframe dfn
作为基于假设的结果。这种操作试图减少循环时间,并使用向量方面来处理问题(如果你有巨大的数据集)。
Name Depth Value
0 AS 15.0 100
0 AS 15.1 110
0 AS 15.2 120
0 AS 15.3 130
0 AS 15.4 140
0 AS 15.5 150
0 AS 15.6 160
0 AS 15.7 170
0 AS 15.8 180
0 AS 15.9 190
1 AS 16.0 200
1 AS 16.1 210
1 AS 16.2 220
1 AS 16.3 230
1 AS 16.4 240
1 AS 16.5 250
1 AS 16.6 260
1 AS 16.7 270
1 AS 16.8 280
1 AS 16.9 290
2 AS 17.0 300
2 AS 17.1 310
:
:
这是一个解决方案,它允许您使用步长的任何变化对值进行插值 (假设步长正好落在整数之间) 并且插值更加灵活:
my_df_list = []
step = 0.1
for label, group in df.sort_values('Depth').groupby('Name'):
# Create a lookup dictionary for interpolation lookup
lookup_dict = {x[0]:x[1] for x in group[['Depth', 'Value']].values}
# Use np.linespace because of the strictness of start and end values
new_index = np.linspace(
start = group['Depth'].min(),
stop = group['Depth'].max(),
num = int(1/step) * np.ptp(group['Depth']) + 1
)
new_values = pd.Series(
lookup_dict.get(round(x, 1)) for x in new_index
).interpolate()
# Create a tmp df with your values
df_tmp = pd.DataFrame.from_dict({
'Name': [label] * len(new_index),
'Depth': new_index,
'Value':new_values
})
my_df_list.append(df_tmp)
# Finally, combine all dfs
df_final = pd.concat(my_df_list, ignore_index=True)
Name Depth Value
0 AS 15.0 100.0
1 AS 15.1 110.0
...
19 AS 16.9 290.0
20 AS 17.0 300.0
21 DB 10.0 200.0
22 DB 10.1 210.0
...
39 DB 11.8 380.0
40 DB 11.9 390.0
41 DB 12.0 400.0
所以,基本上,我有一个如下所示的 DataFrame:
任务是使 'Depth' 增加 0.1 步(添加新行),并相应地插入“值”。
它应该是这样的: (由于尺寸原因裁剪了底部)
这是我写的代码草稿:
import pandas as pd
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
'Depth': [15, 16, 17, 10, 11, 12],
'Value': [100, 200, 300, 200, 300, 400]})
df['Depth']= ... #make it here with increment 0.1
df['Value'] = df['Value'].interpolate(method=linear)
df['Name'] = ... #copy it for each empty row
df.to_csv('Interpolated values.csv')
下面给出的解决方案将解决问题。
import pandas as pd
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
'Depth': [15, 16, 17, 10, 11, 12],
'Value':[100, 200, 300, 200, 300, 400]})
counter = 0.0
def add(val):
global counter
if counter <=0.9:
val = val+counter
counter+=0.1
return val
else:
counter=0.1
return val
# Duplicate each rows 10 times and sort using the index
df = pd.concat([df]*10).sort_index()
# Apply add function on the depth
df['Depth'] = df['Depth'].apply(add)
# Reset the index
df= df.reset_index(drop=True)
# Increment the value by 10 based on the last value
for idx in range(1,len(df)):
df.loc[idx, 'Value'] = df.loc[idx-1,'Value']+10
输出:
Name Depth Value
0 AS 15.0 100
1 AS 15.1 110
2 AS 15.2 120
3 AS 15.3 130
4 AS 15.4 140
5 AS 15.5 150
6 AS 15.6 160
7 AS 15.7 170
8 AS 15.8 180
9 AS 15.9 190
10 AS 16.0 200
11 AS 16.1 210
12 AS 16.2 220
13 AS 16.3 230
14 AS 16.4 240
第 1 部分:
我选择不使用迭代,而是为整个列分配新值。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
'Depth': [15, 16, 17, 10, 11, 12],
'Value':[100, 200, 300, 200, 300, 400]
})
输出:
Name Depth Value
0 AS 15 100
1 AS 16 200
2 AS 17 300
3 DB 10 200
4 DB 11 300
5 DB 12 400
使用
len
获取列的长度。df[column][0]
得到初始值。如果您确实有特定的初始值,则只需跳过此步骤。为其分配初始值。
ini_1 = df['Depth'][0] # initial value
ini_2 = df['Value'][0] # initial value
length = len(df)
step_1 = 0.1
step_2 = 10
df['Depth'] = np.arange(ini_1, ini_1+length*step_1, step_1)
df['Value'] = np.arange(ini_2, ini_2+length*step_2, step_2)
输出
Name Depth Value
0 AS 15.0 100
1 AS 15.1 110
2 AS 15.2 120
3 DB 15.3 130
4 DB 15.4 140
5 DB 15.5 150
由于我们不知道 Name
和 Depth
之间的变体规则,但这是避免迭代到每一行的另一个方面。
第 2 部分:
假设每个名称深度组扩展到 10 个项目
并分别在 Depth
和 Value
上按照 0.1 和 10 的增量。
步骤如下:
- 加载数据帧
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['AS', 'AS', 'AS', 'DB', 'DB', 'DB'],
'Depth': [15, 16, 17, 10, 11, 12],
'Value':[100, 200, 300, 200, 300, 400]
})
- 将
df
扩展到 10 倍:
dfn = pd.concat([df]*10,ignore_index=False).sort_index()
- 这是等差数列:
对于Depth
a = 0,d = 0.1,长度 = 10
对于Value
a = 0,d = 10,长度 = 10
将它们作为每个Name-Depth
组中的向量(一维数组)求和:
a = 0
d_depth = 0.1
d_value = 10
length = 10
arithmetric_1 = [round(a + d_depth * (n - 1),2) for n in range(1, length + 1)] # arithmetic progression series for Depth
arithmetric_2 = [round(a + d_value * (n - 1),2) for n in range(1, length + 1)] # arithmetic progression series for Value
- 主要部分
for i in set(dfn.index):
dfn.loc[i,'Depth'] = dfn.loc[i,'Depth'].array + arithmetric_1
dfn.loc[i,'Value'] = dfn.loc[i,'Value'].array + arithmetric_2
总结:
现在你得到 dataframe dfn
作为基于假设的结果。这种操作试图减少循环时间,并使用向量方面来处理问题(如果你有巨大的数据集)。
Name Depth Value
0 AS 15.0 100
0 AS 15.1 110
0 AS 15.2 120
0 AS 15.3 130
0 AS 15.4 140
0 AS 15.5 150
0 AS 15.6 160
0 AS 15.7 170
0 AS 15.8 180
0 AS 15.9 190
1 AS 16.0 200
1 AS 16.1 210
1 AS 16.2 220
1 AS 16.3 230
1 AS 16.4 240
1 AS 16.5 250
1 AS 16.6 260
1 AS 16.7 270
1 AS 16.8 280
1 AS 16.9 290
2 AS 17.0 300
2 AS 17.1 310
:
:
这是一个解决方案,它允许您使用步长的任何变化对值进行插值 (假设步长正好落在整数之间) 并且插值更加灵活:
my_df_list = []
step = 0.1
for label, group in df.sort_values('Depth').groupby('Name'):
# Create a lookup dictionary for interpolation lookup
lookup_dict = {x[0]:x[1] for x in group[['Depth', 'Value']].values}
# Use np.linespace because of the strictness of start and end values
new_index = np.linspace(
start = group['Depth'].min(),
stop = group['Depth'].max(),
num = int(1/step) * np.ptp(group['Depth']) + 1
)
new_values = pd.Series(
lookup_dict.get(round(x, 1)) for x in new_index
).interpolate()
# Create a tmp df with your values
df_tmp = pd.DataFrame.from_dict({
'Name': [label] * len(new_index),
'Depth': new_index,
'Value':new_values
})
my_df_list.append(df_tmp)
# Finally, combine all dfs
df_final = pd.concat(my_df_list, ignore_index=True)
Name Depth Value
0 AS 15.0 100.0
1 AS 15.1 110.0
...
19 AS 16.9 290.0
20 AS 17.0 300.0
21 DB 10.0 200.0
22 DB 10.1 210.0
...
39 DB 11.8 380.0
40 DB 11.9 390.0
41 DB 12.0 400.0