多列的条件命名

Conditional naming for multiple columns

我有一个数据集;

>>> all_transcripts

ID  Type    Name
1   Guest   Hugo
1   Guest   Hugo   
1   Boss    Boss
1   Boss    Boss
2   Boss    Boss
2   Guest   Calvin
2   Guest   Calvin             
3   Guest   Klein
3   Boss    Boss

现在,我想创建一个名为 nameGuest 的列,其中包含每行每个 ID 的客人姓名。因此,我想要的输出如下所示:

>>> all_transcripts

ID  Type    Name     nameGuest
1   Guest   Hugo     Hugo
1   Guest   Hugo     Hugo   
1   Boss    Boss     Hugo
1   Boss    Boss     Hugo
2   Boss    Boss     Calvin
2   Guest   Calvin   Calvin
2   Guest   Calvin   Calvin    
3   Guest   Klein    Klein
3   Boss    Boss     Klein

我该怎么做?

使用 Series.map by helper Series created by boolean indexing, DataFrame.drop_duplicates and DataFrame.set_index 获取每组 Guest 的第一个值:

s = df[df['Type'] == 'Guest'].drop_duplicates('ID').set_index('ID')['Name']
df['nameGuest'] = df['ID'].map(s)
print (df)
   ID   Type    Name nameGuest
0   1  Guest    Hugo      Hugo
1   1  Guest    Hugo      Hugo
2   1   Boss    Boss      Hugo
3   1   Boss    Boss      Hugo
4   2   Boss    Boss    Calvin
5   2  Guest  Calvin    Calvin
6   2  Guest  Calvin    Calvin
7   3  Guest   Klein     Klein
8   3   Boss    Boss     Klein

Groupby.first

您可以在聚合时使用 groupby and before that filter on Type=Guest and select the first 名称。

这将为我们提供相应 ID 的名称。所以我们可以将其映射回我们的数据框并创建新列:


names = df[df['Type'] == 'Guest'].groupby('ID')['Name'].first()

df['nameGuest'] = df['ID'].map(names)

print(df)
   ID   Type    Name nameGuest
0   1  Guest    Hugo      Hugo
1   1  Guest    Hugo      Hugo
2   1   Boss    Boss      Hugo
3   1   Boss    Boss      Hugo
4   2   Boss    Boss    Calvin
5   2  Guest  Calvin    Calvin
6   2  Guest  Calvin    Calvin
7   3  Guest   Klein     Klein
8   3   Boss    Boss     Klein

names

的输出
print(names)
ID
1      Hugo
2    Calvin
3     Klein
Name: Name, dtype: object