使用预定义列表分解行,同时保留现有行的值

Explode rows with predefined lists while keeping the values for existing rows

我正在努力使用预定义列表展开行,同时保留现有行的值。

我有一个这样的数据框:

df = pd.DataFrame({'id': ['01','01','02'],
                    'color': ['red', 'yellow','yellow'],
                    'wave': ['1', '2', '2'],
                    'count':[1,2,1]})
print(df)
[output]
   id   color wave  count
0  01     red    1      1
1  01  yellow    2      2
2  02  yellow    2      1

我有两个列表:

ls_color = ['yellow', 'red', 'blue']
ls_wave = ['1','2']

我预期的数据框:

    id   color wave  count
0   01  yellow    1      0
1   01  yellow    2      2
2   01     red    1      1
3   01     red    2      0
4   01    blue    1      0
5   01    blue    2      0
6   02  yellow    1      0
7   02  yellow    2      1
8   02     red    1      0
9   02     red    2      0
10  02    blue    1      0
11  02    blue    2      0
from itertools import product
# create an expand of all combinations of color and wave
expand = list(product(df.id.unique(), ls_color, ls_wave))

expand
[('01', 'yellow', '1'), ('01', 'yellow', '2'), ('01', 'red', '1'), ('01', 'red', '2'), ('01', 'blue', '1'), ('01', 'blue', '2'), ('02', 'yellow', '1'), ('02', 'yellow', '2'), ('02', 'red', '1'), ('02', 'red', '2'), ('02', 'blue', '1'), ('02', 'blue', '2')]

# merge with original dataframe to add the count column
pd.DataFrame.from_records(expand, columns=['id', 'color', 'wave'])
  .merge(df, how='left').fillna(0)

    id   color wave  count
0   01  yellow    1    0.0
1   01  yellow    2    2.0
2   01     red    1    1.0
3   01     red    2    0.0
4   01    blue    1    0.0
5   01    blue    2    0.0
6   02  yellow    1    0.0
7   02  yellow    2    1.0
8   02     red    1    0.0
9   02     red    2    0.0
10  02    blue    1    0.0
11  02    blue    2    0.0

使用explode,这是一种方法:

df2 = pd.DataFrame({'id':df['id'], 'color':[ls_color]*len(df), 'wave':[ls_wave]*len(df)})
df2 = df2.explode('color', ignore_index=True).explode('wave', ignore_index=True)
df2.merge(df, how = 'left').fillna(0).drop_duplicates(ignore_index=True)

输出:

df2

    id   color  wave    count
0   01  yellow     1      0.0
1   01  yellow     2      2.0
2   01     red     1      1.0
3   01     red     2      0.0
4   01    blue     1      0.0
5   01    blue     2      0.0
6   02  yellow     1      0.0
7   02  yellow     2      1.0
8   02     red     1      0.0
9   02     red     2      0.0
10  02    blue     1      0.0
11  02    blue     2      0.0