如何将新列插入数据框并访问具有不同索引的行?

How to insert a new column into a dataframe and access rows with different indices?

我有一个包含一列“数字”的数据框,我想添加第二列“结果”。这些值应该是“数字”列中前两个值的总和,否则为 NaN。

import pandas as pd
import numpy as np

data = {
    "Numbers": [100,200,400,0]
}

df = pd.DataFrame(data,index = ["whatever1", "whatever2", "whatever3", "whatever4"])

def add_prev_two_elems_to_DF(df):
    numbers = "Numbers" # alias
    result = "Result"   # alias
    df[result] = np.nan # empty column
    result_index = list(df.columns).index(result)
    for i in range(len(df)):
        #row = df.iloc[i]
        if i < 2: df.iloc[i,result_index] = np.nan
        else: df.iloc[i,result_index] = df.iloc[i-1][numbers] + df.iloc[i-2][numbers]

add_prev_two_elems_to_DF(df)
display(df)

输出为:

            Numbers Result
whatever1   100     NaN
whatever2   200     NaN
whatever3   400     300.0
whatever4   0       600.0

但这看起来很复杂。这可以更容易、更快地完成吗?我不是在寻找 sum() 的解决方案。我想要一个通用的解决方案,适用于可以使用其他行的值填充列的任何类型的函数。

编辑 1: 我忘了导入 numpy。

编辑 2: 我改了一行:

if i < 2: df.iloc[i,result_index] = np.nan

看起来您可以将 rolling.sumshift 一起使用。由于 rollling.sum 求和到当前行,我们必须将其向下移动一行,以便每行值与前 2 行的总和相匹配:

df['Result'] = df['Numbers'].rolling(2).sum().shift()

输出:

           Numbers  Result
whatever1      100     NaN
whatever2      200     NaN
whatever3      400   300.0
whatever4        0   600.0

这是我能开发的最短的代码。它输出完全相同的 table.

import numpy as np
import pandas as pd
#import swifter # apply() gets swifter

data = {
    "Numbers": [100,200,400,0]
}

df = pd.DataFrame(data,index = ["whatever1", "whatever2", "whatever3", "whatever4"])

def func(a: np.ndarray) -> float: # we expect 3 elements, but we don't check that
    a.reset_index(inplace=True,drop=True) # the index now starts with 0, 1,...
    return a[0] + a[1] # we use the first two elements, the 3rd is unnecessary

df["Result"] = df["Numbers"].rolling(3).apply(func)
#df["Result"] = df["Numbers"].swifter.rolling(3).apply(func)
display(df)