使用预定义列表分解行,同时保留现有行的值
Explode rows with predefined lists while keeping the values for existing rows
我正在努力使用预定义列表展开行,同时保留现有行的值。
我有一个这样的数据框:
df = pd.DataFrame({'id': ['01','01','02'],
'color': ['red', 'yellow','yellow'],
'wave': ['1', '2', '2'],
'count':[1,2,1]})
print(df)
[output]
id color wave count
0 01 red 1 1
1 01 yellow 2 2
2 02 yellow 2 1
我有两个列表:
ls_color = ['yellow', 'red', 'blue']
ls_wave = ['1','2']
我预期的数据框:
id color wave count
0 01 yellow 1 0
1 01 yellow 2 2
2 01 red 1 1
3 01 red 2 0
4 01 blue 1 0
5 01 blue 2 0
6 02 yellow 1 0
7 02 yellow 2 1
8 02 red 1 0
9 02 red 2 0
10 02 blue 1 0
11 02 blue 2 0
from itertools import product
# create an expand of all combinations of color and wave
expand = list(product(df.id.unique(), ls_color, ls_wave))
expand
[('01', 'yellow', '1'), ('01', 'yellow', '2'), ('01', 'red', '1'), ('01', 'red', '2'), ('01', 'blue', '1'), ('01', 'blue', '2'), ('02', 'yellow', '1'), ('02', 'yellow', '2'), ('02', 'red', '1'), ('02', 'red', '2'), ('02', 'blue', '1'), ('02', 'blue', '2')]
# merge with original dataframe to add the count column
pd.DataFrame.from_records(expand, columns=['id', 'color', 'wave'])
.merge(df, how='left').fillna(0)
id color wave count
0 01 yellow 1 0.0
1 01 yellow 2 2.0
2 01 red 1 1.0
3 01 red 2 0.0
4 01 blue 1 0.0
5 01 blue 2 0.0
6 02 yellow 1 0.0
7 02 yellow 2 1.0
8 02 red 1 0.0
9 02 red 2 0.0
10 02 blue 1 0.0
11 02 blue 2 0.0
使用explode
,这是一种方法:
df2 = pd.DataFrame({'id':df['id'], 'color':[ls_color]*len(df), 'wave':[ls_wave]*len(df)})
df2 = df2.explode('color', ignore_index=True).explode('wave', ignore_index=True)
df2.merge(df, how = 'left').fillna(0).drop_duplicates(ignore_index=True)
输出:
df2
id color wave count
0 01 yellow 1 0.0
1 01 yellow 2 2.0
2 01 red 1 1.0
3 01 red 2 0.0
4 01 blue 1 0.0
5 01 blue 2 0.0
6 02 yellow 1 0.0
7 02 yellow 2 1.0
8 02 red 1 0.0
9 02 red 2 0.0
10 02 blue 1 0.0
11 02 blue 2 0.0
我正在努力使用预定义列表展开行,同时保留现有行的值。
我有一个这样的数据框:
df = pd.DataFrame({'id': ['01','01','02'],
'color': ['red', 'yellow','yellow'],
'wave': ['1', '2', '2'],
'count':[1,2,1]})
print(df)
[output]
id color wave count
0 01 red 1 1
1 01 yellow 2 2
2 02 yellow 2 1
我有两个列表:
ls_color = ['yellow', 'red', 'blue']
ls_wave = ['1','2']
我预期的数据框:
id color wave count
0 01 yellow 1 0
1 01 yellow 2 2
2 01 red 1 1
3 01 red 2 0
4 01 blue 1 0
5 01 blue 2 0
6 02 yellow 1 0
7 02 yellow 2 1
8 02 red 1 0
9 02 red 2 0
10 02 blue 1 0
11 02 blue 2 0
from itertools import product
# create an expand of all combinations of color and wave
expand = list(product(df.id.unique(), ls_color, ls_wave))
expand
[('01', 'yellow', '1'), ('01', 'yellow', '2'), ('01', 'red', '1'), ('01', 'red', '2'), ('01', 'blue', '1'), ('01', 'blue', '2'), ('02', 'yellow', '1'), ('02', 'yellow', '2'), ('02', 'red', '1'), ('02', 'red', '2'), ('02', 'blue', '1'), ('02', 'blue', '2')]
# merge with original dataframe to add the count column
pd.DataFrame.from_records(expand, columns=['id', 'color', 'wave'])
.merge(df, how='left').fillna(0)
id color wave count
0 01 yellow 1 0.0
1 01 yellow 2 2.0
2 01 red 1 1.0
3 01 red 2 0.0
4 01 blue 1 0.0
5 01 blue 2 0.0
6 02 yellow 1 0.0
7 02 yellow 2 1.0
8 02 red 1 0.0
9 02 red 2 0.0
10 02 blue 1 0.0
11 02 blue 2 0.0
使用explode
,这是一种方法:
df2 = pd.DataFrame({'id':df['id'], 'color':[ls_color]*len(df), 'wave':[ls_wave]*len(df)})
df2 = df2.explode('color', ignore_index=True).explode('wave', ignore_index=True)
df2.merge(df, how = 'left').fillna(0).drop_duplicates(ignore_index=True)
输出:
df2
id color wave count
0 01 yellow 1 0.0
1 01 yellow 2 2.0
2 01 red 1 1.0
3 01 red 2 0.0
4 01 blue 1 0.0
5 01 blue 2 0.0
6 02 yellow 1 0.0
7 02 yellow 2 1.0
8 02 red 1 0.0
9 02 red 2 0.0
10 02 blue 1 0.0
11 02 blue 2 0.0