如何将新列插入数据框并访问具有不同索引的行?
How to insert a new column into a dataframe and access rows with different indices?
我有一个包含一列“数字”的数据框,我想添加第二列“结果”。这些值应该是“数字”列中前两个值的总和,否则为 NaN。
import pandas as pd
import numpy as np
data = {
"Numbers": [100,200,400,0]
}
df = pd.DataFrame(data,index = ["whatever1", "whatever2", "whatever3", "whatever4"])
def add_prev_two_elems_to_DF(df):
numbers = "Numbers" # alias
result = "Result" # alias
df[result] = np.nan # empty column
result_index = list(df.columns).index(result)
for i in range(len(df)):
#row = df.iloc[i]
if i < 2: df.iloc[i,result_index] = np.nan
else: df.iloc[i,result_index] = df.iloc[i-1][numbers] + df.iloc[i-2][numbers]
add_prev_two_elems_to_DF(df)
display(df)
输出为:
Numbers Result
whatever1 100 NaN
whatever2 200 NaN
whatever3 400 300.0
whatever4 0 600.0
但这看起来很复杂。这可以更容易、更快地完成吗?我不是在寻找 sum() 的解决方案。我想要一个通用的解决方案,适用于可以使用其他行的值填充列的任何类型的函数。
编辑 1: 我忘了导入 numpy。
编辑 2: 我改了一行:
if i < 2: df.iloc[i,result_index] = np.nan
看起来您可以将 rolling.sum
与 shift
一起使用。由于 rollling.sum
求和到当前行,我们必须将其向下移动一行,以便每行值与前 2 行的总和相匹配:
df['Result'] = df['Numbers'].rolling(2).sum().shift()
输出:
Numbers Result
whatever1 100 NaN
whatever2 200 NaN
whatever3 400 300.0
whatever4 0 600.0
这是我能开发的最短的代码。它输出完全相同的 table.
import numpy as np
import pandas as pd
#import swifter # apply() gets swifter
data = {
"Numbers": [100,200,400,0]
}
df = pd.DataFrame(data,index = ["whatever1", "whatever2", "whatever3", "whatever4"])
def func(a: np.ndarray) -> float: # we expect 3 elements, but we don't check that
a.reset_index(inplace=True,drop=True) # the index now starts with 0, 1,...
return a[0] + a[1] # we use the first two elements, the 3rd is unnecessary
df["Result"] = df["Numbers"].rolling(3).apply(func)
#df["Result"] = df["Numbers"].swifter.rolling(3).apply(func)
display(df)
我有一个包含一列“数字”的数据框,我想添加第二列“结果”。这些值应该是“数字”列中前两个值的总和,否则为 NaN。
import pandas as pd
import numpy as np
data = {
"Numbers": [100,200,400,0]
}
df = pd.DataFrame(data,index = ["whatever1", "whatever2", "whatever3", "whatever4"])
def add_prev_two_elems_to_DF(df):
numbers = "Numbers" # alias
result = "Result" # alias
df[result] = np.nan # empty column
result_index = list(df.columns).index(result)
for i in range(len(df)):
#row = df.iloc[i]
if i < 2: df.iloc[i,result_index] = np.nan
else: df.iloc[i,result_index] = df.iloc[i-1][numbers] + df.iloc[i-2][numbers]
add_prev_two_elems_to_DF(df)
display(df)
输出为:
Numbers Result
whatever1 100 NaN
whatever2 200 NaN
whatever3 400 300.0
whatever4 0 600.0
但这看起来很复杂。这可以更容易、更快地完成吗?我不是在寻找 sum() 的解决方案。我想要一个通用的解决方案,适用于可以使用其他行的值填充列的任何类型的函数。
编辑 1: 我忘了导入 numpy。
编辑 2: 我改了一行:
if i < 2: df.iloc[i,result_index] = np.nan
看起来您可以将 rolling.sum
与 shift
一起使用。由于 rollling.sum
求和到当前行,我们必须将其向下移动一行,以便每行值与前 2 行的总和相匹配:
df['Result'] = df['Numbers'].rolling(2).sum().shift()
输出:
Numbers Result
whatever1 100 NaN
whatever2 200 NaN
whatever3 400 300.0
whatever4 0 600.0
这是我能开发的最短的代码。它输出完全相同的 table.
import numpy as np
import pandas as pd
#import swifter # apply() gets swifter
data = {
"Numbers": [100,200,400,0]
}
df = pd.DataFrame(data,index = ["whatever1", "whatever2", "whatever3", "whatever4"])
def func(a: np.ndarray) -> float: # we expect 3 elements, but we don't check that
a.reset_index(inplace=True,drop=True) # the index now starts with 0, 1,...
return a[0] + a[1] # we use the first two elements, the 3rd is unnecessary
df["Result"] = df["Numbers"].rolling(3).apply(func)
#df["Result"] = df["Numbers"].swifter.rolling(3).apply(func)
display(df)