根据几个条件将一列拆分成几列并分组
Split a column into several columns based on several conditions and group by
我有一个示例数据框,如下所示。
import pandas as pd
data = {'ID':['A','A','A','A','A','A','A','A','A','C','C','C','C','C','C','C','C'],
'Week': ['Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2','Week3',
'Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2'],
'Risk':['High','','','','','','','','','High','','','','','','',''],
'Testing':['','Pos','','Neg','','','','','Pos', '', '','','Neg','','','','Pos'],
'Week1_adher':['','','','','','','','','', '','','','','','','',''],
'Week2_adher':['','','','','','','','','','','','','','','','',''],
'Week3_adher':['','','','','','','','','','','','','','','','','']}
df1 = pd.DataFrame(data)
df1
现在我想计算每个参与者每周的依从性。其计算如下:
如果参与者在一周的测试列中有 2 个或更多条目(positive/negative),则该周的依从性为 'Yes',否则为 'No'
例如,对于参与者 A,周 1_adherence 是 'Yes' 因为它在第 1 周的测试列中有 2 个条目。 Week2_adherence 是 'No'
而且我希望所有参与者的第一行显示一周的依从性结果。
最终数据框应该如下图所示。
我已经被困在这个问题上很长一段时间了。任何帮助是极大的赞赏。谢谢。
尝试:
adher = (df1.Testing.ne('') # check for non-empty string
.groupby([df1.ID, df1.Week]) # groupby ID and week
.sum().ge(2) # count and check >= 2
.unstack(fill_value=False)
.replace({True:'Yes', False:'No'})
.add_suffix('_adher')
)
# the first lines
mask = ~df1['ID'].duplicated()
df1.loc[mask, adher.columns] = adher.loc[df1.loc[mask,'ID']].values
输出:
ID Week Risk Testing Week1_adher Week2_adher Week3_adher
0 A Week1 High Yes No No
1 A Week1 Pos
2 A Week1
3 A Week1 Neg
4 A Week2
5 A Week2
6 A Week2
7 A Week2
8 A Week3 Pos
9 C Week1 High No No No
10 C Week1
11 C Week1
12 C Week1 Negative
13 C Week2
14 C Week2
15 C Week2
16 C Week2 Positive
我有一个示例数据框,如下所示。
import pandas as pd
data = {'ID':['A','A','A','A','A','A','A','A','A','C','C','C','C','C','C','C','C'],
'Week': ['Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2','Week3',
'Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2'],
'Risk':['High','','','','','','','','','High','','','','','','',''],
'Testing':['','Pos','','Neg','','','','','Pos', '', '','','Neg','','','','Pos'],
'Week1_adher':['','','','','','','','','', '','','','','','','',''],
'Week2_adher':['','','','','','','','','','','','','','','','',''],
'Week3_adher':['','','','','','','','','','','','','','','','','']}
df1 = pd.DataFrame(data)
df1
现在我想计算每个参与者每周的依从性。其计算如下: 如果参与者在一周的测试列中有 2 个或更多条目(positive/negative),则该周的依从性为 'Yes',否则为 'No'
例如,对于参与者 A,周 1_adherence 是 'Yes' 因为它在第 1 周的测试列中有 2 个条目。 Week2_adherence 是 'No'
而且我希望所有参与者的第一行显示一周的依从性结果。
最终数据框应该如下图所示。
我已经被困在这个问题上很长一段时间了。任何帮助是极大的赞赏。谢谢。
尝试:
adher = (df1.Testing.ne('') # check for non-empty string
.groupby([df1.ID, df1.Week]) # groupby ID and week
.sum().ge(2) # count and check >= 2
.unstack(fill_value=False)
.replace({True:'Yes', False:'No'})
.add_suffix('_adher')
)
# the first lines
mask = ~df1['ID'].duplicated()
df1.loc[mask, adher.columns] = adher.loc[df1.loc[mask,'ID']].values
输出:
ID Week Risk Testing Week1_adher Week2_adher Week3_adher
0 A Week1 High Yes No No
1 A Week1 Pos
2 A Week1
3 A Week1 Neg
4 A Week2
5 A Week2
6 A Week2
7 A Week2
8 A Week3 Pos
9 C Week1 High No No No
10 C Week1
11 C Week1
12 C Week1 Negative
13 C Week2
14 C Week2
15 C Week2
16 C Week2 Positive