与 pandas 中的团队合作
Operating with groups in pandas
我有一个问题让我很头疼。
假设我有下一个数据框:
df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']
例如,我按汽车分组。
agrupe = df2.groupby(['CARS'])
问题是,一旦我对它进行了分组,我就想对其进行操作,例如在 BMW 制作的组中,我想将第 2 列的值分配给列上有 2 的元素的第 4 列一。让我看看我是否学会操作它:
g = agrupe.get_group('BMW')
从这里开始
ONE TWO CARS FOUR NAMES
1 1 0 BMW 1 Jon
3 2 1 BMW 1 Mary
4 0 1 BMW 0 Peter
对此:
ONE TWO CARS FOUR NAMES
1 1 0 BMW 1 Jon
3 2 1 BMW 1 Mary
4 0 1 BMW 1 Peter
您似乎需要 groupby
和自定义函数 f
:
np.random.seed(100)
df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']
print (df2)
ONE TWO CARS FOUR NAMES
0 0 0 Mercedes 2 Peter
1 2 0 BMW 1 Jon
2 2 2 Ford 2 Mary
3 1 0 BMW 0 Mary
4 0 2 BMW 1 Peter
5 1 2 Dacia 0 Peter
6 0 1 Ford 1 BONIFACE
7 0 0 Pontiac 1 Michael
8 1 2 Chevrolet 2 Lucy
9 1 1 Tesla 2 Gilari
def f(x):
if (x.name == 'BMW'):
x.loc[x.ONE == 2, 'FOUR'] = x.TWO
return x
agrupe = df2.groupby('CARS').apply(f)
print (agrupe)
ONE TWO CARS FOUR NAMES
0 0 0 Mercedes 2 Peter
1 2 0 BMW 0 Jon
2 2 2 Ford 2 Mary
3 1 0 BMW 0 Mary
4 0 2 BMW 1 Peter
5 1 2 Dacia 0 Peter
6 0 1 Ford 1 BONIFACE
7 0 0 Pontiac 1 Michael
8 1 2 Chevrolet 2 Lucy
9 1 1 Tesla 2 Gilari
更好的解决方案是首先 select 列 CARS
为 BMW
且列 ONE
为 2
的所有行,然后更改 FOUR
按列 TWO
:
df2.loc[(df2.CARS == 'BMW') & (df2.ONE == 2), 'FOUR'] = df2.TWO
print (df2)
ONE TWO CARS FOUR NAMES
0 0 0 Mercedes 2 Peter
1 2 0 BMW 0 Jon
2 2 2 Ford 2 Mary
3 1 0 BMW 0 Mary
4 0 2 BMW 1 Peter
5 1 2 Dacia 0 Peter
6 0 1 Ford 1 BONIFACE
7 0 0 Pontiac 1 Michael
8 1 2 Chevrolet 2 Lucy
9 1 1 Tesla 2 Gilari
或者如果 2
列 ONE
中需要更改,则按 TWO
列更改 FOUR
列:
np.random.seed(13)
df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']
print (df2)
ONE TWO CARS FOUR NAMES
0 2 0 Mercedes 0 Peter
1 2 2 BMW 1 Jon
2 0 2 Ford 0 Mary
3 2 2 BMW 2 Mary
4 1 1 BMW 1 Peter
5 0 2 Dacia 1 Peter
6 2 1 Ford 2 BONIFACE
7 0 0 Pontiac 0 Michael
8 2 2 Chevrolet 0 Lucy
9 1 1 Tesla 2 Gilari
df2.loc[df2.ONE == 2, 'FOUR'] = df2.TWO
print (df2)
ONE TWO CARS FOUR NAMES
0 2 0 Mercedes 0 Peter
1 2 2 BMW 2 Jon
2 0 2 Ford 0 Mary
3 2 2 BMW 2 Mary
4 1 1 BMW 1 Peter
5 0 2 Dacia 1 Peter
6 2 1 Ford 1 BONIFACE
7 0 0 Pontiac 0 Michael
8 2 2 Chevrolet 2 Lucy
9 1 1 Tesla 2 Gilari
我有一个问题让我很头疼。 假设我有下一个数据框:
df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']
例如,我按汽车分组。
agrupe = df2.groupby(['CARS'])
问题是,一旦我对它进行了分组,我就想对其进行操作,例如在 BMW 制作的组中,我想将第 2 列的值分配给列上有 2 的元素的第 4 列一。让我看看我是否学会操作它:
g = agrupe.get_group('BMW')
从这里开始
ONE TWO CARS FOUR NAMES
1 1 0 BMW 1 Jon
3 2 1 BMW 1 Mary
4 0 1 BMW 0 Peter
对此:
ONE TWO CARS FOUR NAMES
1 1 0 BMW 1 Jon
3 2 1 BMW 1 Mary
4 0 1 BMW 1 Peter
您似乎需要 groupby
和自定义函数 f
:
np.random.seed(100)
df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']
print (df2)
ONE TWO CARS FOUR NAMES
0 0 0 Mercedes 2 Peter
1 2 0 BMW 1 Jon
2 2 2 Ford 2 Mary
3 1 0 BMW 0 Mary
4 0 2 BMW 1 Peter
5 1 2 Dacia 0 Peter
6 0 1 Ford 1 BONIFACE
7 0 0 Pontiac 1 Michael
8 1 2 Chevrolet 2 Lucy
9 1 1 Tesla 2 Gilari
def f(x):
if (x.name == 'BMW'):
x.loc[x.ONE == 2, 'FOUR'] = x.TWO
return x
agrupe = df2.groupby('CARS').apply(f)
print (agrupe)
ONE TWO CARS FOUR NAMES
0 0 0 Mercedes 2 Peter
1 2 0 BMW 0 Jon
2 2 2 Ford 2 Mary
3 1 0 BMW 0 Mary
4 0 2 BMW 1 Peter
5 1 2 Dacia 0 Peter
6 0 1 Ford 1 BONIFACE
7 0 0 Pontiac 1 Michael
8 1 2 Chevrolet 2 Lucy
9 1 1 Tesla 2 Gilari
更好的解决方案是首先 select 列 CARS
为 BMW
且列 ONE
为 2
的所有行,然后更改 FOUR
按列 TWO
:
df2.loc[(df2.CARS == 'BMW') & (df2.ONE == 2), 'FOUR'] = df2.TWO
print (df2)
ONE TWO CARS FOUR NAMES
0 0 0 Mercedes 2 Peter
1 2 0 BMW 0 Jon
2 2 2 Ford 2 Mary
3 1 0 BMW 0 Mary
4 0 2 BMW 1 Peter
5 1 2 Dacia 0 Peter
6 0 1 Ford 1 BONIFACE
7 0 0 Pontiac 1 Michael
8 1 2 Chevrolet 2 Lucy
9 1 1 Tesla 2 Gilari
或者如果 2
列 ONE
中需要更改,则按 TWO
列更改 FOUR
列:
np.random.seed(13)
df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']
print (df2)
ONE TWO CARS FOUR NAMES
0 2 0 Mercedes 0 Peter
1 2 2 BMW 1 Jon
2 0 2 Ford 0 Mary
3 2 2 BMW 2 Mary
4 1 1 BMW 1 Peter
5 0 2 Dacia 1 Peter
6 2 1 Ford 2 BONIFACE
7 0 0 Pontiac 0 Michael
8 2 2 Chevrolet 0 Lucy
9 1 1 Tesla 2 Gilari
df2.loc[df2.ONE == 2, 'FOUR'] = df2.TWO
print (df2)
ONE TWO CARS FOUR NAMES
0 2 0 Mercedes 0 Peter
1 2 2 BMW 2 Jon
2 0 2 Ford 0 Mary
3 2 2 BMW 2 Mary
4 1 1 BMW 1 Peter
5 0 2 Dacia 1 Peter
6 2 1 Ford 1 BONIFACE
7 0 0 Pontiac 0 Michael
8 2 2 Chevrolet 2 Lucy
9 1 1 Tesla 2 Gilari