使用 DataFrame 中的值从 pandas Dataframe 中的函数构建新列

Question

我是 pandas DataFrame 的新手，我遇到了一些困难，因为我不知道如何访问特定的单元格来进行计算以填充新的单元格。

我想使用 apply 来调用带有第 1 行单元格数据的外部函数。

我做到了，但是将所有内容输出到一个简单的数组中，但我很确定有更好的方法来做到这一点：

我使用以下索引从 csv 构建我的数据帧：

DateIndex = pd.date_range(start="2005-1-1", end="2017-1-1", freq=BDay())

根据以下摘录，我确定我的数据框没问题：

2005-01-03    0.005742
2005-01-04    0.003765
2005-01-05   -0.005536
2005-01-06    0.001500
2005-01-07    0.007471
2005-01-10    0.002108
2005-01-11   -0.003195
2005-01-12   -0.003076
2005-01-13    0.005416
2005-01-14    0.003090

所以，我想在第一个条目上加 100，对于其他条目，加一个然后乘以前一个条目。

我能够在数组中这样做：

for i in range(0,len(df.index)):
    if i == 0:
        listV = [df.iloc[i] + 100]
    else:
        listV.append(listV[i-1] * (1 + df.iloc[i]))

有没有办法做到这一点并将结果直接放入数据框的新列中？

非常感谢，问候，朱利安

Answer 1

这是实现相同目的的更好方法：

col_copy = df.col.copy()   # generate a copy to isolate the series completely
col_copy.iloc[0] += 100    # Increment first row by 100
col_copy.iloc[1:] += 1     # Increment 1 to rest

df.assign(new_col=col_copy.cumprod()) # compute cumulative product and assign to new column

产量：

数据：

考虑一个 DF 和单列 'Col' 的准备：

txt = StringIO(
"""
2005-01-03    0.005742
2005-01-04    0.003765
2005-01-05   -0.005536
2005-01-06    0.001500
2005-01-07    0.007471
2005-01-10    0.002108
2005-01-11   -0.003195
2005-01-12   -0.003076
2005-01-13    0.005416
2005-01-14    0.003090
""")

df = pd.read_csv(txt, delim_whitespace=True, parse_dates=True, header=None, 
                 index_col=['date'], names=['date', 'col'])
df.index.name = None
df

Answer 2

初始化

df = pd.DataFrame(dict(
        col=[ 0.005742,  0.003765, -0.005536,  0.0015  ,  0.007471,
              0.002108, -0.003195, -0.003076,  0.005416,  0.00309 ]
    ), pd.to_datetime([
            '2005-01-03', '2005-01-04', '2005-01-05', '2005-01-06', '2005-01-07', 
            '2005-01-10', '2005-01-11', '2005-01-12', '2005-01-13', '2005-01-14'])
    )

print(df)

                 col
2005-01-03  0.005742
2005-01-04  0.003765
2005-01-05 -0.005536
2005-01-06  0.001500
2005-01-07  0.007471
2005-01-10  0.002108
2005-01-11 -0.003195
2005-01-12 -0.003076
2005-01-13  0.005416
2005-01-14  0.003090

评论
这看起来是一系列 return。通过将 100 添加到第一个观察值，您将第一个 return 边缘化，使其成为 .57 基点而不是 .57 百分比

我相信你想要做的是对每项加一，然后乘积，然后乘以 100。

这将显示 100 的累积增长，这就是我相信你想要的。

df.add(1).cumprod().mul(100)

                   col
2005-01-03  100.574200
2005-01-04  100.952862
2005-01-05  100.393987
2005-01-06  100.544578
2005-01-07  101.295746
2005-01-10  101.509278
2005-01-11  101.184956
2005-01-12  100.873711
2005-01-13  101.420043
2005-01-14  101.733431

df.add(1).cumprod().mul(100).plot()

使用 DataFrame 中的值从 pandas Dataframe 中的函数构建新列

Build new column from a function in a pandas Dataframe using values from DataFrame

python

apply

dataframe

pandas