在 2 列上分组并将一列拆分为具有前 2 个非 Na 值的 2 列
Groupby on 2 columns and split a column into 2 columns with first 2 nonNa values
我有一个示例数据框,如下所示。
import pandas as pd
import numpy as np
NaN = np.nan
data = {'ID':['A','A','A','A','A','A','A','A','A','C','C','C','C','C','C','C','C'],
'Week': ['Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2','Week3',
'Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2'],
'Risk':['High','','','','','','','','','High','','','','','','',''],
'Testing':[NaN,'Pos',NaN,'Neg',NaN,NaN,NaN,NaN,'Pos', NaN,
NaN,NaN,'Negative',NaN,NaN,NaN,'Positive'],
'CloseContact': [NaN, 'True', NaN, NaN, 'False',NaN, NaN, 'False', 'True',
NaN, NaN, 'False', NaN, 'True','True','False', NaN ]}
df1 = pd.DataFrame(data)
df1
现在,必须创建 2 列 CC1 和 CC2。对于每个 ID,每周(重要),CC1 将获得 'CloseContact' 列的第一个非空值,CC2 将获得 'CloseContact' 列的第二个非空值。
最终数据框应如下图所示。
非常感谢任何帮助。谢谢。
尝试:
import pandas as pd
import numpy as np
NaN = np.nan
data = {'ID': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'Week': ['Week1', 'Week1', 'Week1', 'Week1', 'Week2', 'Week2', 'Week2', 'Week2', 'Week3',
'Week1', 'Week1', 'Week1', 'Week1', 'Week2', 'Week2', 'Week2', 'Week2', 'Week3'],
'Risk': ['High', '', '', '', '', '', '', '', '', 'High', '', '', '', '', '', '', '', ''],
'Testing': [NaN, 'Pos', NaN, 'Neg', NaN, NaN, NaN, NaN, 'Pos', NaN,
NaN, NaN, 'Negative', NaN, NaN, NaN, 'Positive', NaN],
'CloseContact': [NaN, NaN, NaN, NaN, 'False', NaN, NaN, 'False', 'True',
NaN, NaN, 'False', NaN, 'True', 'True', 'False', NaN, NaN]}
df1 = pd.DataFrame(data)
df = df1.groupby(['ID', 'Week'])['CloseContact'].apply(lambda x: x[x.notnull()].values[0:2]).reset_index()
df[['CC1','CC2']] = pd.DataFrame(df.CloseContact.tolist(), index= df.index)
df.drop(columns=['CloseContact'], inplace=True)
print(df)
原DF:
ID Week Risk Testing CloseContact
0 A Week1 High NaN NaN
1 A Week1 Pos NaN
2 A Week1 NaN NaN
3 A Week1 Neg NaN
4 A Week2 NaN False
5 A Week2 NaN NaN
6 A Week2 NaN NaN
7 A Week2 NaN False
8 A Week3 Pos True
9 C Week1 High NaN NaN
10 C Week1 NaN NaN
11 C Week1 NaN False
12 C Week1 Negative NaN
13 C Week2 NaN True
14 C Week2 NaN True
15 C Week2 NaN False
16 C Week2 Positive NaN
17 C Week3 NaN NaN
最终输出:
ID Week CC1 CC2
0 A Week1 None None
1 A Week2 False False
2 A Week3 True None
3 C Week1 False None
4 C Week2 True True
5 C Week3 None None
喜欢你的:
mi = pd.MultiIndex.from_product([df1['ID'].unique(), df1['Week'].unique()],
names=['ID', 'Week'])
out = df1.loc[df1['CloseContact'].notna()] \
.groupby(['ID', 'Week'])['CloseContact'] \
.apply(lambda x: x.head(2).tolist()) \
.apply(pd.Series).rename(columns={0: 'CC1', 1: 'CC2'}) \
.reindex(mi).reset_index()
输出:
>>> out
ID Week CC1 CC2
0 A Week1 True NaN
1 A Week2 False False
2 A Week3 True NaN
3 C Week1 False NaN
4 C Week2 True True
5 C Week3 NaN NaN
我有一个示例数据框,如下所示。
import pandas as pd
import numpy as np
NaN = np.nan
data = {'ID':['A','A','A','A','A','A','A','A','A','C','C','C','C','C','C','C','C'],
'Week': ['Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2','Week3',
'Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2'],
'Risk':['High','','','','','','','','','High','','','','','','',''],
'Testing':[NaN,'Pos',NaN,'Neg',NaN,NaN,NaN,NaN,'Pos', NaN,
NaN,NaN,'Negative',NaN,NaN,NaN,'Positive'],
'CloseContact': [NaN, 'True', NaN, NaN, 'False',NaN, NaN, 'False', 'True',
NaN, NaN, 'False', NaN, 'True','True','False', NaN ]}
df1 = pd.DataFrame(data)
df1
现在,必须创建 2 列 CC1 和 CC2。对于每个 ID,每周(重要),CC1 将获得 'CloseContact' 列的第一个非空值,CC2 将获得 'CloseContact' 列的第二个非空值。
最终数据框应如下图所示。
非常感谢任何帮助。谢谢。
尝试:
import pandas as pd
import numpy as np
NaN = np.nan
data = {'ID': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'Week': ['Week1', 'Week1', 'Week1', 'Week1', 'Week2', 'Week2', 'Week2', 'Week2', 'Week3',
'Week1', 'Week1', 'Week1', 'Week1', 'Week2', 'Week2', 'Week2', 'Week2', 'Week3'],
'Risk': ['High', '', '', '', '', '', '', '', '', 'High', '', '', '', '', '', '', '', ''],
'Testing': [NaN, 'Pos', NaN, 'Neg', NaN, NaN, NaN, NaN, 'Pos', NaN,
NaN, NaN, 'Negative', NaN, NaN, NaN, 'Positive', NaN],
'CloseContact': [NaN, NaN, NaN, NaN, 'False', NaN, NaN, 'False', 'True',
NaN, NaN, 'False', NaN, 'True', 'True', 'False', NaN, NaN]}
df1 = pd.DataFrame(data)
df = df1.groupby(['ID', 'Week'])['CloseContact'].apply(lambda x: x[x.notnull()].values[0:2]).reset_index()
df[['CC1','CC2']] = pd.DataFrame(df.CloseContact.tolist(), index= df.index)
df.drop(columns=['CloseContact'], inplace=True)
print(df)
原DF:
ID Week Risk Testing CloseContact
0 A Week1 High NaN NaN
1 A Week1 Pos NaN
2 A Week1 NaN NaN
3 A Week1 Neg NaN
4 A Week2 NaN False
5 A Week2 NaN NaN
6 A Week2 NaN NaN
7 A Week2 NaN False
8 A Week3 Pos True
9 C Week1 High NaN NaN
10 C Week1 NaN NaN
11 C Week1 NaN False
12 C Week1 Negative NaN
13 C Week2 NaN True
14 C Week2 NaN True
15 C Week2 NaN False
16 C Week2 Positive NaN
17 C Week3 NaN NaN
最终输出:
ID Week CC1 CC2
0 A Week1 None None
1 A Week2 False False
2 A Week3 True None
3 C Week1 False None
4 C Week2 True True
5 C Week3 None None
喜欢你的
mi = pd.MultiIndex.from_product([df1['ID'].unique(), df1['Week'].unique()],
names=['ID', 'Week'])
out = df1.loc[df1['CloseContact'].notna()] \
.groupby(['ID', 'Week'])['CloseContact'] \
.apply(lambda x: x.head(2).tolist()) \
.apply(pd.Series).rename(columns={0: 'CC1', 1: 'CC2'}) \
.reindex(mi).reset_index()
输出:
>>> out
ID Week CC1 CC2
0 A Week1 True NaN
1 A Week2 False False
2 A Week3 True NaN
3 C Week1 False NaN
4 C Week2 True True
5 C Week3 NaN NaN