多列的条件命名
Conditional naming for multiple columns
我有一个数据集;
>>> all_transcripts
ID Type Name
1 Guest Hugo
1 Guest Hugo
1 Boss Boss
1 Boss Boss
2 Boss Boss
2 Guest Calvin
2 Guest Calvin
3 Guest Klein
3 Boss Boss
现在,我想创建一个名为 nameGuest
的列,其中包含每行每个 ID 的客人姓名。因此,我想要的输出如下所示:
>>> all_transcripts
ID Type Name nameGuest
1 Guest Hugo Hugo
1 Guest Hugo Hugo
1 Boss Boss Hugo
1 Boss Boss Hugo
2 Boss Boss Calvin
2 Guest Calvin Calvin
2 Guest Calvin Calvin
3 Guest Klein Klein
3 Boss Boss Klein
我该怎么做?
使用 Series.map
by helper Series
created by boolean indexing
, DataFrame.drop_duplicates
and DataFrame.set_index
获取每组 Guest
的第一个值:
s = df[df['Type'] == 'Guest'].drop_duplicates('ID').set_index('ID')['Name']
df['nameGuest'] = df['ID'].map(s)
print (df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Klein
Groupby.first
您可以在聚合时使用 groupby
and before that filter on Type=Guest
and select the first
名称。
这将为我们提供相应 ID
的名称。所以我们可以将其映射回我们的数据框并创建新列:
names = df[df['Type'] == 'Guest'].groupby('ID')['Name'].first()
df['nameGuest'] = df['ID'].map(names)
print(df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Klein
names
的输出
print(names)
ID
1 Hugo
2 Calvin
3 Klein
Name: Name, dtype: object
我有一个数据集;
>>> all_transcripts
ID Type Name
1 Guest Hugo
1 Guest Hugo
1 Boss Boss
1 Boss Boss
2 Boss Boss
2 Guest Calvin
2 Guest Calvin
3 Guest Klein
3 Boss Boss
现在,我想创建一个名为 nameGuest
的列,其中包含每行每个 ID 的客人姓名。因此,我想要的输出如下所示:
>>> all_transcripts
ID Type Name nameGuest
1 Guest Hugo Hugo
1 Guest Hugo Hugo
1 Boss Boss Hugo
1 Boss Boss Hugo
2 Boss Boss Calvin
2 Guest Calvin Calvin
2 Guest Calvin Calvin
3 Guest Klein Klein
3 Boss Boss Klein
我该怎么做?
使用 Series.map
by helper Series
created by boolean indexing
, DataFrame.drop_duplicates
and DataFrame.set_index
获取每组 Guest
的第一个值:
s = df[df['Type'] == 'Guest'].drop_duplicates('ID').set_index('ID')['Name']
df['nameGuest'] = df['ID'].map(s)
print (df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Klein
Groupby.first
您可以在聚合时使用 groupby
and before that filter on Type=Guest
and select the first
名称。
这将为我们提供相应 ID
的名称。所以我们可以将其映射回我们的数据框并创建新列:
names = df[df['Type'] == 'Guest'].groupby('ID')['Name'].first()
df['nameGuest'] = df['ID'].map(names)
print(df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Klein
names
print(names)
ID
1 Hugo
2 Calvin
3 Klein
Name: Name, dtype: object