为每个客户添加缺失的行 - Python / Pandas
Add missing rows for each Client - Python / Pandas
我有周、月和年的 df。
week = ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022']
month = ["January", "January", "January", "January"]
year = [2022, 2022, 2022, 2022]
myDict = {}
myDict["Week"] = week
myDict["Month"] = month
myDict["Year"] = year
dates_df = pd.DataFrame(data=myDict)
dates_df
Week Month Year
01/03/2022 - 01/09/2022 January 2022
01/10/2022 - 01/16/2022 January 2022
01/17/2022 - 01/23/2022 January 2022
01/24/2022 - 01/30/2022 January 2022
数据看起来像这样(以前分组):
test_data = {'CLient Id': [1,1,1,1,2,2,2,3,3],
'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Anna Delvey', 'Anna Delvey'],
'City': ['New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Los Angeles', 'New York', 'New York'],
'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/24/2022 - 01/30/2022', '01/03/2022 - 01/09/2022',
'01/24/2022 - 01/30/2022'],
'Month': ['January', 'January', 'January', 'January', 'January', 'January', 'January', 'January', 'January'],
'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
'Spent': [1000, 750, 1200, 850, 777, 1500, 1200, 1500, 1200]}
test_df = pd.DataFrame(data=test_data)
test_df
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
我需要为每个缺少周的客户 (Id) 创建额外的客户名称、城市、周、月和年行,从日期 df + Spent 列在这种情况下应该为 0。
我需要的输出:
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0
3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
我正在尝试编写代码,但没有成功。
您可以 pivot
+ reindex
+ fillna
(获取丢失的数据)+ stack
(返回到之前的形状):
columns = ['Week','Month','Year']
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, ['Spent'])
.reindex(pd.MultiIndex.from_arrays(dates_df.assign(Spent='Spent').to_numpy()[:, [-1,0,1,2]].T,
names=[None]+columns), axis=1)
.fillna(0).stack(level=columns).reset_index())
输出:
CLient Id Client Name City Week Month Year Spent
0 1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000.0
1 1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750.0
2 1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200.0
3 1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850.0
4 2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777.0
5 2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500.0
6 2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0.0
7 2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200.0
8 3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500.0
9 3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0.0
10 3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0.0
11 3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200.0
如果你有不止一列要用0填充,你可以使用:
columns = ['Week','Month','Year']
value_columns = ['Spent', ...]
new_df = (pd.DataFrame({'key':[1]*len(value_columns),'New':value_columns})
.merge(dates_df.assign(key=1)).drop(columns='key'))
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, value_columns)
.reindex(pd.MultiIndex.from_arrays(zip(*new_df.to_numpy()),
names=[None] + columns), axis=1)
.fillna(0).stack(level=columns).reset_index())
我有周、月和年的 df。
week = ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022']
month = ["January", "January", "January", "January"]
year = [2022, 2022, 2022, 2022]
myDict = {}
myDict["Week"] = week
myDict["Month"] = month
myDict["Year"] = year
dates_df = pd.DataFrame(data=myDict)
dates_df
Week Month Year
01/03/2022 - 01/09/2022 January 2022
01/10/2022 - 01/16/2022 January 2022
01/17/2022 - 01/23/2022 January 2022
01/24/2022 - 01/30/2022 January 2022
数据看起来像这样(以前分组):
test_data = {'CLient Id': [1,1,1,1,2,2,2,3,3],
'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Anna Delvey', 'Anna Delvey'],
'City': ['New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Los Angeles', 'New York', 'New York'],
'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/24/2022 - 01/30/2022', '01/03/2022 - 01/09/2022',
'01/24/2022 - 01/30/2022'],
'Month': ['January', 'January', 'January', 'January', 'January', 'January', 'January', 'January', 'January'],
'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
'Spent': [1000, 750, 1200, 850, 777, 1500, 1200, 1500, 1200]}
test_df = pd.DataFrame(data=test_data)
test_df
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
我需要为每个缺少周的客户 (Id) 创建额外的客户名称、城市、周、月和年行,从日期 df + Spent 列在这种情况下应该为 0。
我需要的输出:
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0
3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
我正在尝试编写代码,但没有成功。
您可以 pivot
+ reindex
+ fillna
(获取丢失的数据)+ stack
(返回到之前的形状):
columns = ['Week','Month','Year']
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, ['Spent'])
.reindex(pd.MultiIndex.from_arrays(dates_df.assign(Spent='Spent').to_numpy()[:, [-1,0,1,2]].T,
names=[None]+columns), axis=1)
.fillna(0).stack(level=columns).reset_index())
输出:
CLient Id Client Name City Week Month Year Spent
0 1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000.0
1 1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750.0
2 1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200.0
3 1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850.0
4 2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777.0
5 2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500.0
6 2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0.0
7 2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200.0
8 3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500.0
9 3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0.0
10 3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0.0
11 3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200.0
如果你有不止一列要用0填充,你可以使用:
columns = ['Week','Month','Year']
value_columns = ['Spent', ...]
new_df = (pd.DataFrame({'key':[1]*len(value_columns),'New':value_columns})
.merge(dates_df.assign(key=1)).drop(columns='key'))
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, value_columns)
.reindex(pd.MultiIndex.from_arrays(zip(*new_df.to_numpy()),
names=[None] + columns), axis=1)
.fillna(0).stack(level=columns).reset_index())