获取 Pandas 中组的第一个值

Question

我有一个 DataFrame 并想生成一个报告。

示例数据

data = {
 'Date': {0: '2021-01-04 10:45:00',
  1: '2021-01-04 10:45:00',
  2: '2021-01-05 11:15:00',
  3: '2021-01-05 11:15:00',
  4: '2021-01-06 12:15:00',
  5: '2021-01-06 12:15:00'},
 'Action': {0: 'A', 1: 'B', 2: 'P', 3: 'Q', 4: 'X', 5: 'Y'},
 'Profit': {0: np.NaN, 1: -2637.93, 2: np.NaN, 3: 11008.4, 4: np.NaN, 5: -2977.49},
 }
df = pd.DataFrame(data)

我的正常做法是创建一个函数来处理所有这样的计算。

def magic(x):
    result = {
        'Action' : x['Action'], #Need help here
        'Profit': x['Profit'].sum()
    }
    return pd.Series(result)

df = df.groupby(['Date']).apply(magic)

我想要发生的事情：

操作列应包含组的第一个值。

想要的输出：

Date                | Action     | Profit
2021-01-04 10:45:00 |A           |-2637.93
2021-01-05 11:15:00 |P           |11008.40
2021-01-06 12:15:00 |X           |-2977.49

实际输出：

Date                | Action                                | Profit
2021-01-04 10:45:00 |0 A 1 B Name: Action, dtype: object    |-2637.93
2021-01-05 11:15:00 |2 P 3 Q Name: Action, dtype: object    |11008.40
2021-01-06 12:15:00 |4 X 5 Y Name: Action, dtype: object    |-2977.49

我的实际函数将有更多列，因此最好在魔术函数内完成所有操作。

Answer 1

我想你想在 Action 上调用 first 方法:

out = df.groupby('Date').agg({'Action': 'first', 'Profit': 'sum'})

输出：

                    Action    Profit
Date                                
2021-01-04 10:45:00      A  -2637.93
2021-01-05 11:15:00      P  11008.40
2021-01-06 12:15:00      X  -2977.49

Answer 2

如果魔术函数从多个输入列中计算新列，则无法使用 agg。

然后 select 索引的第一个值：

def magic(x):
result = {
    'Action' : x['Action'].iat[0]
    'Profit': x['Profit'].sum()
}
return pd.Series(result)

获取 Pandas 中组的第一个值

getting first value of the groups in Pandas

python

dataframe

pandas

pandas-groupby