与 pandas 中的团队合作

Operating with groups in pandas

我有一个问题让我很头疼。 假设我有下一个数据框:

df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']

例如,我按汽车分组。

agrupe = df2.groupby(['CARS'])

问题是,一旦我对它进行了分组,我就想对其进行操作,例如在 BMW 制作的组中,我想将第 2 列的值分配给列上有 2 的元素的第 4 列一。让我看看我是否学会操作它:

g = agrupe.get_group('BMW')

从这里开始

     ONE TWO CARS  FOUR  NAMES
1    1    0  BMW     1    Jon
3    2    1  BMW     1   Mary
4    0    1  BMW     0  Peter

对此:

    ONE  TWO CARS  FOUR  NAMES
1    1    0  BMW     1   Jon
3    2    1  BMW     1   Mary
4    0    1  BMW     1  Peter

您似乎需要 groupby 和自定义函数 f:

np.random.seed(100)
df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']
print (df2)
   ONE  TWO       CARS  FOUR     NAMES
0    0    0   Mercedes     2     Peter
1    2    0        BMW     1       Jon
2    2    2       Ford     2      Mary
3    1    0        BMW     0      Mary
4    0    2        BMW     1     Peter
5    1    2      Dacia     0     Peter
6    0    1       Ford     1  BONIFACE
7    0    0    Pontiac     1   Michael
8    1    2  Chevrolet     2      Lucy
9    1    1      Tesla     2    Gilari
def f(x):
    if (x.name == 'BMW'):
        x.loc[x.ONE == 2, 'FOUR'] = x.TWO
    return x

agrupe = df2.groupby('CARS').apply(f)
print (agrupe)
   ONE  TWO       CARS  FOUR     NAMES
0    0    0   Mercedes     2     Peter
1    2    0        BMW     0       Jon
2    2    2       Ford     2      Mary
3    1    0        BMW     0      Mary
4    0    2        BMW     1     Peter
5    1    2      Dacia     0     Peter
6    0    1       Ford     1  BONIFACE
7    0    0    Pontiac     1   Michael
8    1    2  Chevrolet     2      Lucy
9    1    1      Tesla     2    Gilari

更好的解决方案是首先 select 列 CARSBMW 且列 ONE2 的所有行,然后更改 FOUR按列 TWO:

df2.loc[(df2.CARS == 'BMW') & (df2.ONE == 2), 'FOUR'] = df2.TWO
print (df2)
   ONE  TWO       CARS  FOUR     NAMES
0    0    0   Mercedes     2     Peter
1    2    0        BMW     0       Jon
2    2    2       Ford     2      Mary
3    1    0        BMW     0      Mary
4    0    2        BMW     1     Peter
5    1    2      Dacia     0     Peter
6    0    1       Ford     1  BONIFACE
7    0    0    Pontiac     1   Michael
8    1    2  Chevrolet     2      Lucy
9    1    1      Tesla     2    Gilari

或者如果 2ONE 中需要更改,则按 TWO 列更改 FOUR 列:

np.random.seed(13)
df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']
print (df2)
   ONE  TWO       CARS  FOUR     NAMES
0    2    0   Mercedes     0     Peter
1    2    2        BMW     1       Jon
2    0    2       Ford     0      Mary
3    2    2        BMW     2      Mary
4    1    1        BMW     1     Peter
5    0    2      Dacia     1     Peter
6    2    1       Ford     2  BONIFACE
7    0    0    Pontiac     0   Michael
8    2    2  Chevrolet     0      Lucy
9    1    1      Tesla     2    Gilari


df2.loc[df2.ONE == 2, 'FOUR'] = df2.TWO
print (df2)
   ONE  TWO       CARS  FOUR     NAMES
0    2    0   Mercedes     0     Peter
1    2    2        BMW     2       Jon
2    0    2       Ford     0      Mary
3    2    2        BMW     2      Mary
4    1    1        BMW     1     Peter
5    0    2      Dacia     1     Peter
6    2    1       Ford     1  BONIFACE
7    0    0    Pontiac     0   Michael
8    2    2  Chevrolet     2      Lucy
9    1    1      Tesla     2    Gilari