Pandas DataFrame 按多列上的连续相同值分组
Pandas DataFrame group by consecutive same values on multiple columns
我需要为列列表重新组合具有相同值的连续行。多亏了 this 我已经找到了如何为一列做这件事,但我不能让它为多个列工作。
我的问题与 非常接近,但我也无法按照我的意愿让它工作。
这是一个工作片段,我需要列 user
、group
、value1
和 value2
相同以重新组合行:
#! /bin/python3
import pandas as pd
data = [{"user":"paul","group":"accounting","value1":"foo","value2":3,"value3":"random123"},{"user":"paul","group":"accounting","value1":"foo","value2":3,"value3":"random456"},{"user":"paul","group":"accounting","value1":"foo","value2":3,"value3":"random789"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random789"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random789"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random158"},{"user":"jack","group":"administration","value1":"foo","value2":5,"value3":"random487"},{"user":"jack","group":"administration","value1":"foo","value2":5,"value3":"random435"},{"user":"jack","group":"administration","value1":"bar","value2":3,"value3":"random483"},{"user":"jack","group":"administration","value1":"foo","value2":3,"value3":"random431"},{"user":"jack","group":"administration","value1":"foo","value2":3,"value3":"random478"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random759"},{"user":"jack","group":"administration","value1":"bar","value2":3,"value3":"random431"},{"user":"jack","group":"administration","value1":"foo","value2":3,"value3":"random478"}]
df = pd.DataFrame(data)
print(df)
print("----")
grouped = df.groupby(((df['value2'].shift() != df['value2'])).cumsum())
for k, v in grouped:
print(f'[group {k}]')
print(v)
它输出这个:
[group 1]
user group value1 value2 value3
0 paul accounting foo 3 random123
1 paul accounting foo 3 random456
2 paul accounting foo 3 random789
[group 2]
user group value1 value2 value3
3 paul accounting foo 5 random789
4 paul accounting foo 5 random789
5 paul accounting foo 5 random158
6 jack administration foo 5 random487
7 jack administration foo 5 random435
[group 3]
user group value1 value2 value3
8 jack administration bar 3 random483
9 jack administration foo 3 random431
10 jack administration foo 3 random478
[group 4]
user group value1 value2 value3
11 paul accounting foo 5 random759
[group 5]
user group value1 value2 value3
12 jack administration bar 3 random431
13 jack administration foo 3 random478
但我需要这个:
[group 1]
user group value1 value2 value3
0 paul accounting foo 3 random123
1 paul accounting foo 3 random456
2 paul accounting foo 3 random789
[group 2]
user group value1 value2 value3
3 paul accounting foo 5 random789
4 paul accounting foo 5 random789
5 paul accounting foo 5 random158
[group 3]
user group value1 value2 value3
6 jack administration foo 5 random487
7 jack administration foo 5 random435
[group 4]
user group value1 value2 value3
8 jack administration bar 3 random483
[group 5]
user group value1 value2 value3
9 jack administration foo 3 random431
10 jack administration foo 3 random478
[group 6]
user group value1 value2 value3
11 paul accounting foo 5 random759
[group 7]
user group value1 value2 value3
12 jack administration bar 3 random431
[group 8]
user group value1 value2 value3
13 jack administration foo 3 random478
我尝试了 groupby 中的多个列但无济于事:
grouped = df.groupby(((df[['user', 'value2']].shift() != df[['user', 'value2']])).cumsum())
#returns
ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
通过将列表中的列与 DataFrame.any
进行比较来创建连续的组,然后添加累计和:
cols = ['user','group','value1','value2']
grouped = df.groupby(((df[cols].shift() != df[cols]).any(axis=1)).cumsum())
for k, v in grouped:
print(f'[group {k}]')
print(v)
[group 1]
user group value1 value2 value3
0 paul accounting foo 3 random123
1 paul accounting foo 3 random456
2 paul accounting foo 3 random789
[group 2]
user group value1 value2 value3
3 paul accounting foo 5 random789
4 paul accounting foo 5 random789
5 paul accounting foo 5 random158
[group 3]
user group value1 value2 value3
6 jack administration foo 5 random487
7 jack administration foo 5 random435
[group 4]
user group value1 value2 value3
8 jack administration bar 3 random483
[group 5]
user group value1 value2 value3
9 jack administration foo 3 random431
10 jack administration foo 3 random478
[group 6]
user group value1 value2 value3
11 paul accounting foo 5 random759
[group 7]
user group value1 value2 value3
12 jack administration bar 3 random431
[group 8]
user group value1 value2 value3
13 jack administration foo 3 random478
我需要为列列表重新组合具有相同值的连续行。多亏了 this 我已经找到了如何为一列做这件事,但我不能让它为多个列工作。
我的问题与
这是一个工作片段,我需要列 user
、group
、value1
和 value2
相同以重新组合行:
#! /bin/python3
import pandas as pd
data = [{"user":"paul","group":"accounting","value1":"foo","value2":3,"value3":"random123"},{"user":"paul","group":"accounting","value1":"foo","value2":3,"value3":"random456"},{"user":"paul","group":"accounting","value1":"foo","value2":3,"value3":"random789"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random789"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random789"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random158"},{"user":"jack","group":"administration","value1":"foo","value2":5,"value3":"random487"},{"user":"jack","group":"administration","value1":"foo","value2":5,"value3":"random435"},{"user":"jack","group":"administration","value1":"bar","value2":3,"value3":"random483"},{"user":"jack","group":"administration","value1":"foo","value2":3,"value3":"random431"},{"user":"jack","group":"administration","value1":"foo","value2":3,"value3":"random478"},{"user":"paul","group":"accounting","value1":"foo","value2":5,"value3":"random759"},{"user":"jack","group":"administration","value1":"bar","value2":3,"value3":"random431"},{"user":"jack","group":"administration","value1":"foo","value2":3,"value3":"random478"}]
df = pd.DataFrame(data)
print(df)
print("----")
grouped = df.groupby(((df['value2'].shift() != df['value2'])).cumsum())
for k, v in grouped:
print(f'[group {k}]')
print(v)
它输出这个:
[group 1]
user group value1 value2 value3
0 paul accounting foo 3 random123
1 paul accounting foo 3 random456
2 paul accounting foo 3 random789
[group 2]
user group value1 value2 value3
3 paul accounting foo 5 random789
4 paul accounting foo 5 random789
5 paul accounting foo 5 random158
6 jack administration foo 5 random487
7 jack administration foo 5 random435
[group 3]
user group value1 value2 value3
8 jack administration bar 3 random483
9 jack administration foo 3 random431
10 jack administration foo 3 random478
[group 4]
user group value1 value2 value3
11 paul accounting foo 5 random759
[group 5]
user group value1 value2 value3
12 jack administration bar 3 random431
13 jack administration foo 3 random478
但我需要这个:
[group 1]
user group value1 value2 value3
0 paul accounting foo 3 random123
1 paul accounting foo 3 random456
2 paul accounting foo 3 random789
[group 2]
user group value1 value2 value3
3 paul accounting foo 5 random789
4 paul accounting foo 5 random789
5 paul accounting foo 5 random158
[group 3]
user group value1 value2 value3
6 jack administration foo 5 random487
7 jack administration foo 5 random435
[group 4]
user group value1 value2 value3
8 jack administration bar 3 random483
[group 5]
user group value1 value2 value3
9 jack administration foo 3 random431
10 jack administration foo 3 random478
[group 6]
user group value1 value2 value3
11 paul accounting foo 5 random759
[group 7]
user group value1 value2 value3
12 jack administration bar 3 random431
[group 8]
user group value1 value2 value3
13 jack administration foo 3 random478
我尝试了 groupby 中的多个列但无济于事:
grouped = df.groupby(((df[['user', 'value2']].shift() != df[['user', 'value2']])).cumsum())
#returns
ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
通过将列表中的列与 DataFrame.any
进行比较来创建连续的组,然后添加累计和:
cols = ['user','group','value1','value2']
grouped = df.groupby(((df[cols].shift() != df[cols]).any(axis=1)).cumsum())
for k, v in grouped:
print(f'[group {k}]')
print(v)
[group 1]
user group value1 value2 value3
0 paul accounting foo 3 random123
1 paul accounting foo 3 random456
2 paul accounting foo 3 random789
[group 2]
user group value1 value2 value3
3 paul accounting foo 5 random789
4 paul accounting foo 5 random789
5 paul accounting foo 5 random158
[group 3]
user group value1 value2 value3
6 jack administration foo 5 random487
7 jack administration foo 5 random435
[group 4]
user group value1 value2 value3
8 jack administration bar 3 random483
[group 5]
user group value1 value2 value3
9 jack administration foo 3 random431
10 jack administration foo 3 random478
[group 6]
user group value1 value2 value3
11 paul accounting foo 5 random759
[group 7]
user group value1 value2 value3
12 jack administration bar 3 random431
[group 8]
user group value1 value2 value3
13 jack administration foo 3 random478