条件 groupby 和更新列 - python, pandas, groupby
conditional groupby and update column - python, pandas, groupby
我有一个 df,我想添加一个列,显示来自组('subject'、'class')的第 (1) 名学生,并在有一个之后更新该列新地点 (1).
代码:
data = {
"subject": ['eng','math','math','math','math','math','math','math','math','math','math','math','math','eng','eng'],
"class": ['Class_4','Class_3','Class_3','Class_3','Class_3','Class_3','Class_3','Class_3','Class_3',
'Class_3','Class_3','Class_3','Class_3','Class_4','Class_4'],
"student": ['henry','pan','larry','larry','henry','larry','terry','henry','henry',
'henry','pan','pan','moose','pan','moose'],
"place": [7,8,10,1,7,10,9,7,11,1,11,3,6,2,4]}
df = pd.DataFrame(data)
╔═════════╦═════════╦═════════╦═══════╗
║ subject ║ class ║ student ║ place ║
╠═════════╬═════════╬═════════╬═══════╣
║ eng ║ Class_4 ║ henry ║ 7 ║
║ math ║ Class_3 ║ pan ║ 8 ║
║ math ║ Class_3 ║ larry ║ 10 ║
║ math ║ Class_3 ║ larry ║ 1 ║
║ math ║ Class_3 ║ henry ║ 7 ║
║ math ║ Class_3 ║ larry ║ 10 ║
║ math ║ Class_3 ║ terry ║ 9 ║
║ math ║ Class_3 ║ henry ║ 7 ║
║ math ║ Class_3 ║ henry ║ 11 ║
║ math ║ Class_3 ║ henry ║ 1 ║
║ math ║ Class_3 ║ pan ║ 11 ║
║ math ║ Class_3 ║ pan ║ 3 ║
║ math ║ Class_3 ║ moose ║ 6 ║
║ eng ║ Class_4 ║ pan ║ 2 ║
║ eng ║ Class_4 ║ moose ║ 4 ║
╚═════════╩═════════╩═════════╩═══════╝
尝试获得的结果
╔═════════╦═════════╦═════════╦═══════╦═════════╗
║ subject ║ class ║ student ║ place ║ new_col ║
╠═════════╬═════════╬═════════╬═══════╬═════════╣
║ eng ║ Class_4 ║ henry ║ 7 ║ nil ║
║ math ║ Class_3 ║ pan ║ 8 ║ nil ║
║ math ║ Class_3 ║ larry ║ 10 ║ nil ║
║ math ║ Class_3 ║ larry ║ 1 ║ nil ║
║ math ║ Class_3 ║ henry ║ 7 ║ larry ║
║ math ║ Class_3 ║ larry ║ 10 ║ larry ║
║ math ║ Class_3 ║ terry ║ 9 ║ larry ║
║ math ║ Class_3 ║ henry ║ 7 ║ larry ║
║ math ║ Class_3 ║ henry ║ 11 ║ larry ║
║ math ║ Class_3 ║ henry ║ 1 ║ larry ║
║ math ║ Class_3 ║ pan ║ 11 ║ henry ║
║ math ║ Class_3 ║ pan ║ 3 ║ henry ║
║ math ║ Class_3 ║ moose ║ 6 ║ henry ║
║ eng ║ Class_4 ║ pan ║ 2 ║ nil ║
║ eng ║ Class_4 ║ moose ║ 4 ║ nil ║
╚═════════╩═════════╩═════════╩═══════╩═════════╝
请指教。谢谢
屏蔽 place
列中不等于 1
的值,然后 group
通过 subject
和 class
屏蔽的列并使用 ffill
向前填充值
df['new_col'] = df['student'].mask(df['place'] != 1)
df['new_col'] = df.groupby(['subject', 'class'])['new_col'].ffill()
print(df)
subject class student place new_col
0 eng Class_4 henry 7 NaN
1 math Class_3 pan 8 NaN
2 math Class_3 larry 10 NaN
3 math Class_3 larry 1 larry
4 math Class_3 henry 7 larry
5 math Class_3 larry 10 larry
6 math Class_3 terry 9 larry
7 math Class_3 henry 7 larry
8 math Class_3 henry 11 larry
9 math Class_3 henry 1 henry
10 math Class_3 pan 11 henry
11 math Class_3 pan 3 henry
12 math Class_3 moose 6 henry
13 eng Class_4 pan 2 NaN
14 eng Class_4 moose 4 NaN
我有一个 df,我想添加一个列,显示来自组('subject'、'class')的第 (1) 名学生,并在有一个之后更新该列新地点 (1).
代码:
data = {
"subject": ['eng','math','math','math','math','math','math','math','math','math','math','math','math','eng','eng'],
"class": ['Class_4','Class_3','Class_3','Class_3','Class_3','Class_3','Class_3','Class_3','Class_3',
'Class_3','Class_3','Class_3','Class_3','Class_4','Class_4'],
"student": ['henry','pan','larry','larry','henry','larry','terry','henry','henry',
'henry','pan','pan','moose','pan','moose'],
"place": [7,8,10,1,7,10,9,7,11,1,11,3,6,2,4]}
df = pd.DataFrame(data)
╔═════════╦═════════╦═════════╦═══════╗ ║ subject ║ class ║ student ║ place ║ ╠═════════╬═════════╬═════════╬═══════╣ ║ eng ║ Class_4 ║ henry ║ 7 ║ ║ math ║ Class_3 ║ pan ║ 8 ║ ║ math ║ Class_3 ║ larry ║ 10 ║ ║ math ║ Class_3 ║ larry ║ 1 ║ ║ math ║ Class_3 ║ henry ║ 7 ║ ║ math ║ Class_3 ║ larry ║ 10 ║ ║ math ║ Class_3 ║ terry ║ 9 ║ ║ math ║ Class_3 ║ henry ║ 7 ║ ║ math ║ Class_3 ║ henry ║ 11 ║ ║ math ║ Class_3 ║ henry ║ 1 ║ ║ math ║ Class_3 ║ pan ║ 11 ║ ║ math ║ Class_3 ║ pan ║ 3 ║ ║ math ║ Class_3 ║ moose ║ 6 ║ ║ eng ║ Class_4 ║ pan ║ 2 ║ ║ eng ║ Class_4 ║ moose ║ 4 ║ ╚═════════╩═════════╩═════════╩═══════╝
尝试获得的结果
╔═════════╦═════════╦═════════╦═══════╦═════════╗ ║ subject ║ class ║ student ║ place ║ new_col ║ ╠═════════╬═════════╬═════════╬═══════╬═════════╣ ║ eng ║ Class_4 ║ henry ║ 7 ║ nil ║ ║ math ║ Class_3 ║ pan ║ 8 ║ nil ║ ║ math ║ Class_3 ║ larry ║ 10 ║ nil ║ ║ math ║ Class_3 ║ larry ║ 1 ║ nil ║ ║ math ║ Class_3 ║ henry ║ 7 ║ larry ║ ║ math ║ Class_3 ║ larry ║ 10 ║ larry ║ ║ math ║ Class_3 ║ terry ║ 9 ║ larry ║ ║ math ║ Class_3 ║ henry ║ 7 ║ larry ║ ║ math ║ Class_3 ║ henry ║ 11 ║ larry ║ ║ math ║ Class_3 ║ henry ║ 1 ║ larry ║ ║ math ║ Class_3 ║ pan ║ 11 ║ henry ║ ║ math ║ Class_3 ║ pan ║ 3 ║ henry ║ ║ math ║ Class_3 ║ moose ║ 6 ║ henry ║ ║ eng ║ Class_4 ║ pan ║ 2 ║ nil ║ ║ eng ║ Class_4 ║ moose ║ 4 ║ nil ║ ╚═════════╩═════════╩═════════╩═══════╩═════════╝
请指教。谢谢
屏蔽 place
列中不等于 1
的值,然后 group
通过 subject
和 class
屏蔽的列并使用 ffill
向前填充值
df['new_col'] = df['student'].mask(df['place'] != 1)
df['new_col'] = df.groupby(['subject', 'class'])['new_col'].ffill()
print(df)
subject class student place new_col
0 eng Class_4 henry 7 NaN
1 math Class_3 pan 8 NaN
2 math Class_3 larry 10 NaN
3 math Class_3 larry 1 larry
4 math Class_3 henry 7 larry
5 math Class_3 larry 10 larry
6 math Class_3 terry 9 larry
7 math Class_3 henry 7 larry
8 math Class_3 henry 11 larry
9 math Class_3 henry 1 henry
10 math Class_3 pan 11 henry
11 math Class_3 pan 3 henry
12 math Class_3 moose 6 henry
13 eng Class_4 pan 2 NaN
14 eng Class_4 moose 4 NaN