pandas 数据框中列的高效小计

Question

我一直在琢磨一个在工作中可以自动生成月末报告的程序的想法。目前，它以 Excel 格式为我们创建所有报告，然后我们手动使用 Excel 的小计功能对其列进行小计并将数据格式化为 table.

我的想法是按客户对每一列进行小计，如下所示：

Patient	Date	Rx#	Description	Qty	Price
EXAMPLE, JOHN	2/1/2021	357649	Aspirin	30	6.99
EXAMPLE, JOHN	2/1/2021	357650	Drug	30	13.99
EXAMPLE, JOHN	2/1/2021	357651	Tylenol	30	7.99
EXAMPLE, JOHN Subtotal					28.97
EXAMPLE, SUSAN	2/12/2021	357652	Expensive Drug	30	51.99
EXAMPLE, SUSAN	2/12/2021	357653	Drug	30	13.99
EXAMPLE, SUSAN	2/12/2021	357654	Tylenol	30	7.99
EXAMPLE, SUSAN Subtotal					73.97

现有数据框如下所示：

Patient	Date	Rx#	Description	Qty	Price
EXAMPLE, JOHN	2/1/2021	357649	Aspirin	30	6.99
EXAMPLE, JOHN	2/1/2021	357650	Drug	30	13.99
EXAMPLE, JOHN	2/1/2021	357651	Tylenol	30	7.99
EXAMPLE, SUSAN	2/12/2021	357652	Expensive Drug	30	51.99
EXAMPLE, SUSAN	2/12/2021	357653	Drug	30	13.99
EXAMPLE, SUSAN	2/12/2021	357654	Tylenol	30	7.99

这可以用 groupby() 实现吗？它似乎可以选择按行分组而不是按列分组。我看到的更大的问题是插入现有数据框，因为 pandas 似乎更适合对大型数据集进行 manipulating/performing 操作而不是 inserting/adding 信息。

Answer 1

# Calculate sums
df_subtotal = df.groupby('Patient', as_index=False)[['Price']].agg('sum')
# Manipulate string Patient
df_subtotal['Patient'] = df_subtotal['Patient'] + ' subtotal'
# Join dataframes
df_new = pd.concat([df, df_subtotal], axis=0, ignore_index=True)
# Sort
df_new = df_new.sort_values(['Patient', 'Date'])

pandas 数据框中列的高效小计

Efficient Subtotaling of columns in a pandas dataframe

python

conceptual

dataframe

python-3.x

pandas