用列值减去数据帧行
Subtract dataframe rows by a column value
我有分析结果table
Name Analysis Result Date Type
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before
1 Doe J. Albumine 6.5 25.03.2021 8:08:09 after
2 Pine C. Albumine 13.3 25.03.2021 9:17:54 before
3 Pine C. Albumine 11.0 22.02.2021 9:25:54 after
4 Jackson D. Albumine 14.2 23.02.2021 10:51:38 before
5 Jackson D Albumine 12.2 23.03.2021 after
6 Schafer L. Albumine 8.4 25.02.2021 10:39:39 before
7 Schafer L. Albumine 9.3 25.03.2021 12:06:15 after
我的目标是根据'Type'列计算每个患者的两次分析之间的差异(这些都是虚构的)并获得以下table:
Name Before After Difference
0 Doe j. 10.6 6.5 3.9
我尝试过 groupby 但没有成功。非常感谢任何帮助。
将 DataFrame.pivot
与减法结合使用:
df = df.pivot('Name','Type','Result').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])
print (df)
Name after before diff
0 Doe J. 6.5 10.6 4.1
1 Jackson D. 12.2 14.2 2.0
2 Pine C. 11.0 13.3 2.3
3 Schafer L. 9.3 8.4 -0.9
如果错误:
ValueError: Index contains duplicate entries, cannot reshape
这意味着有重复,这意味着相同的 Name, Type
有 2 个或更多值,例如:
print (df)
Name Analysis Result Date Type
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before <- duplicate Doe J., before
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before <- duplicate Doe J., before
1 Doe J. Albumine 6.5 25.03.2021 8:08:09 after
2 Pine C. Albumine 13.3 25.03.2021 9:17:54 before
3 Pine C. Albumine 11.0 22.02.2021 9:25:54 after
4 Jackson D. Albumine 14.2 23.02.2021 10:51:38 before
5 Jackson D. Albumine 12.2 23.03.2021 after
6 Schafer L. Albumine 8.4 25.02.2021 10:39:39 before
7 Schafer L. Albumine 9.3 25.03.2021 12:06:15 after
可能的解决方案 DataFrame.pivot_table
和一些聚合函数,如 mean
、sum
。如果需要第一个匹配值使用 aggfunc='first'
df = df.pivot_table(index='Name',columns='Type',values='Result', aggfunc='sum').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])
print (df)
Name after before diff
0 Doe J. 6.5 21.2 14.7 <- 21.2 because sum
1 Jackson D. 12.2 14.2 2.0
2 Pine C. 11.0 13.3 2.3
3 Schafer L. 9.3 8.4 -0.9
我有分析结果table
Name Analysis Result Date Type
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before
1 Doe J. Albumine 6.5 25.03.2021 8:08:09 after
2 Pine C. Albumine 13.3 25.03.2021 9:17:54 before
3 Pine C. Albumine 11.0 22.02.2021 9:25:54 after
4 Jackson D. Albumine 14.2 23.02.2021 10:51:38 before
5 Jackson D Albumine 12.2 23.03.2021 after
6 Schafer L. Albumine 8.4 25.02.2021 10:39:39 before
7 Schafer L. Albumine 9.3 25.03.2021 12:06:15 after
我的目标是根据'Type'列计算每个患者的两次分析之间的差异(这些都是虚构的)并获得以下table:
Name Before After Difference
0 Doe j. 10.6 6.5 3.9
我尝试过 groupby 但没有成功。非常感谢任何帮助。
将 DataFrame.pivot
与减法结合使用:
df = df.pivot('Name','Type','Result').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])
print (df)
Name after before diff
0 Doe J. 6.5 10.6 4.1
1 Jackson D. 12.2 14.2 2.0
2 Pine C. 11.0 13.3 2.3
3 Schafer L. 9.3 8.4 -0.9
如果错误:
ValueError: Index contains duplicate entries, cannot reshape
这意味着有重复,这意味着相同的 Name, Type
有 2 个或更多值,例如:
print (df)
Name Analysis Result Date Type
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before <- duplicate Doe J., before
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before <- duplicate Doe J., before
1 Doe J. Albumine 6.5 25.03.2021 8:08:09 after
2 Pine C. Albumine 13.3 25.03.2021 9:17:54 before
3 Pine C. Albumine 11.0 22.02.2021 9:25:54 after
4 Jackson D. Albumine 14.2 23.02.2021 10:51:38 before
5 Jackson D. Albumine 12.2 23.03.2021 after
6 Schafer L. Albumine 8.4 25.02.2021 10:39:39 before
7 Schafer L. Albumine 9.3 25.03.2021 12:06:15 after
可能的解决方案 DataFrame.pivot_table
和一些聚合函数,如 mean
、sum
。如果需要第一个匹配值使用 aggfunc='first'
df = df.pivot_table(index='Name',columns='Type',values='Result', aggfunc='sum').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])
print (df)
Name after before diff
0 Doe J. 6.5 21.2 14.7 <- 21.2 because sum
1 Jackson D. 12.2 14.2 2.0
2 Pine C. 11.0 13.3 2.3
3 Schafer L. 9.3 8.4 -0.9