Pandas: 如何根据索引或组ID计算新列？

Question

这可能是一个非常简单的问题，但我找不到解决方案：我想添加一个新列 "col_new"，其操作取决于组 ID 或日期等组变量。因此，根据 groupID，计算应该改变。
示例：

   Year  col1  col2
0  2019    10     1
1  2019     4     2
2  2019    25     1
3  2018     3     1
4  2017    56     2
5  2017     3     2

- 对于 Year = 2017：col_new = col1-col2
- 对于 Year = 2018：col_new = col1+col2
- 对于 Year = 2019：col_new = col1*col2
我也想把它包装在一个 for 循环中。

year = [2017, 2018, 2019]
for x in year:
    df["new_col]" = ................

尝试使用 if 函数 <== 总是需要一个 else，因此它会更改上一次迭代的所有值
使用 .loc 并且它有效，但变得很难处理长而复杂的条件
尝试为列 Year 设置索引。这很容易做，但后来我卡住了。

import pandas as pd
import numpy as np

d = {'Year': [2019, 2019, 2019, 2018, 2017, 2017],
     'col1': [10, 4, 25, 3, 56, 3],
     'col2': [1, 2, 1, 1, 2, 2]}
df = pd.DataFrame(data=d) #the example dataframe
df = df.set_index("Year")
print(df)

      col1  col2
Year            
2019    10     1
2019     4     2
2019    25     1
2018     3     1
2017    56     2
2017     3     2

现在我需要这样的东西：
- 如果是 2017 年，则 col1+col2
- 如果是 2018 年，则 col1-col2
- 如果是 2019，则 col1*col2

Answer 1

`dict` 个运算符

from operator import sub, add, mul

op = {2019: mul, 2018: add, 2017: sub}

df.assign(new_col=[op[t.Year](t.col1, t.col2) for t in df.itertuples()])

   Year  col1  col2  new_col
0  2019    10     1       10
1  2019     4     2        8
2  2019    25     1       25
3  2018     3     1        4
4  2017    56     2       54
5  2017     3     2        1

如果Year在索引中

df.assign(new_col=[op[t.Index](t.col1, t.col2) for t in df.itertuples()])

      col1  col2  new_col
Year                     
2019    10     1       10
2019     4     2        8
2019    25     1       25
2018     3     1        4
2017    56     2       54
2017     3     2        1

Answer 2

您可以使用numpy.select

cond = [df.index == 2017, df.index == 2018, df.index == 2019]
choice = [df.col1+df.col2, df.col1-df.col2, df.col1*df.col2]
df['new'] = np.select(cond, choice)



       col1 col2    new
Year            
2019    10  1       10
2019    4   2       8
2019    25  1       25
2018    3   1       2
2017    56  2       58
2017    3   2       5

Answer 3

您可以使用Pandas应用功能。请注意，我注释了您将 Year 设置为索引的行。

import pandas as pd
import numpy as np

d = {'Year': [2019, 2019, 2019, 2018, 2017, 2017],
     'col1': [10, 4, 25, 3, 56, 3],
     'col2': [1, 2, 1, 1, 2, 2]}

df = pd.DataFrame(data=d) #the example dataframe
#df = df.set_index("Year")
#print(df)

df['new_col'] = df.apply(check, axis=1)
df


def check(row):

    if row[0] == 2017:
        return row[1] - row[2]
    elif row[0] == 2018:
        return row[1] + row[2]
    elif row[0] == 2019:
        return row[1] * row[2]

结果：

    Year    col1    col2    new_col
0   2019    10       1      10
1   2019    4        2      8
2   2019    25       1      25
3   2018    3        1      4
4   2017    56       2      54
5   2017    3        2      1

Pandas: 如何根据索引或组ID计算新列？

Pandas: How to calculate new column based on index or groupID?

indexing

calculated-columns

pandas

`dict` 个运算符

Pandas: 如何根据索引或组ID计算新列？

Pandas: How to calculate new column based on index or groupID?

indexing

calculated-columns

pandas

dict 个运算符

`dict` 个运算符