如何在一组值内求和，然后取另一组值的差值？

Question

假设我有这个包含三个变量的简化数据框：

ID    sample  test_result
P1    Normal           9
P1    Normal           18
P2    Normal           7
P2    Normal           16
P3    Normal           2
P3    Normal           11
P1     Tumor           6
P1     Tumor           15
P2     Tumor           5
P2     Tumor           15
P3     Tumor           3
P3     Tumor           12

我想知道如何对每种样本类型（即 Normal、Tumor）中每个相同 ID 的 test_result 值求和。然后我想计算正常值和肿瘤 test_result 值的总和之间的差异。

我曾尝试在示例列上使用 groupby，然后在 test_result 列上使用 diff() 方法，但这没有用。我想我需要先知道如何应用 .sum()，但不确定如何。

这是我尝试过的：

df.groupby('sample')['test_result'].diff()

我期望的输出如下：

ID   test_result
P1             6 # (the sum of P1 Normal = 27) - (the sum of P1 Tumor = 21)  
P2             3
P3            -2

知道如何解决这个问题吗？

Answer 1

使用groupby with sum and reshape by unstack:

df = df.groupby(['ID','sample'])['test_result'].sum().unstack()

或pivot_table:

df = df.pivot_table(index='ID',columns='sample', values='test_result', aggfunc='sum')

然后减去列：

df['new'] = df['Normal'] - df['Tumor']
print (df)
sample  Normal  Tumor  new
ID                        
P1          27     21    6
P2          23     20    3
P3          13     15   -2

如何在一组值内求和，然后取另一组值的差值？

How to sum within a group of values and then take the difference from another group?

pivot

group-by

python-3.x

pandas

difference