用列值减去数据帧行

Subtract dataframe rows by a column value

我有分析结果table

         Name  Analysis  Result                 Date    Type
0      Doe J.  Albumine    10.6   23.02.2021 8:07:22  before
1      Doe J.  Albumine     6.5   25.03.2021 8:08:09   after
2     Pine C.  Albumine    13.3   25.03.2021 9:17:54  before
3     Pine C.  Albumine    11.0   22.02.2021 9:25:54   after
4  Jackson D.  Albumine    14.2  23.02.2021 10:51:38  before
5   Jackson D  Albumine    12.2           23.03.2021   after
6  Schafer L.  Albumine     8.4  25.02.2021 10:39:39  before
7  Schafer L.  Albumine     9.3  25.03.2021 12:06:15   after

我的目标是根据'Type'列计算每个患者的两次分析之间的差异(这些都是虚构的)并获得以下table:

     Name  Before  After  Difference
0  Doe j.    10.6    6.5         3.9

我尝试过 groupby 但没有成功。非常感谢任何帮助。

DataFrame.pivot 与减法结合使用:

df = df.pivot('Name','Type','Result').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])

print (df)
         Name  after  before  diff
0      Doe J.    6.5    10.6   4.1
1  Jackson D.   12.2    14.2   2.0
2     Pine C.   11.0    13.3   2.3
3  Schafer L.    9.3     8.4  -0.9

如果错误:

ValueError: Index contains duplicate entries, cannot reshape

这意味着有重复,这意味着相同的 Name, Type 有 2 个或更多值,例如:

print (df)
         Name  Analysis  Result                 Date    Type
0      Doe J.  Albumine    10.6   23.02.2021 8:07:22  before <- duplicate Doe J., before
0      Doe J.  Albumine    10.6   23.02.2021 8:07:22  before <- duplicate Doe J., before
1      Doe J.  Albumine     6.5   25.03.2021 8:08:09   after
2     Pine C.  Albumine    13.3   25.03.2021 9:17:54  before
3     Pine C.  Albumine    11.0   22.02.2021 9:25:54   after
4  Jackson D.  Albumine    14.2  23.02.2021 10:51:38  before
5  Jackson D.  Albumine    12.2           23.03.2021   after
6  Schafer L.  Albumine     8.4  25.02.2021 10:39:39  before
7  Schafer L.  Albumine     9.3  25.03.2021 12:06:15   after

可能的解决方案 DataFrame.pivot_table 和一些聚合函数,如 meansum。如果需要第一个匹配值使用 aggfunc='first'

df = df.pivot_table(index='Name',columns='Type',values='Result', aggfunc='sum').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])

print (df)
         Name  after  before  diff
0      Doe J.    6.5    21.2  14.7 <- 21.2 because sum
1  Jackson D.   12.2    14.2   2.0
2     Pine C.   11.0    13.3   2.3
3  Schafer L.    9.3     8.4  -0.9