如何在 Pandas 中扩展和创建以下数据集
How to expand and create the following dataset in Pandas
我有一个如下所示的数据集:
df = pd.DataFrame({
'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
'Daily':[4890.846,5337.086,5559.445,5271.98],
})
我想扩展它以获取每日值并创建一个包含每日日期的新数据集以形成如下内容:
谢谢!!
这似乎是一个简单的问题。使用 df.iterrows() 迭代现有数据框,然后获取当前日期范围,然后使用此范围迭代日期范围,并使用这些日期用详细日期和金额填充另一个数据框。我迭代数据框,因为我将数据扩展到一个新的数据框,而不是使用数据框 explode
添加一个新列
df = pd.DataFrame({
'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
'Daily':[4890.846,5337.086,5559.445,5271.98],
})
#for key, value in df.iterrows():
# fill a range with weekstart and weekend date ranges:
# iterate the range and populate the new data frame
df2=pd.DataFrame(columns=['Date','Amount'])
df2['Date']=pd.to_datetime(df2['Date'])
for key,value in df.iterrows():
amount=value['Daily']
for date in pd.date_range(value['weekstart'], value['weekend']):
df2=df2.append({"Date":date, "Amount":amount},ignore_index=True)
index=range(1,len(df2)+1)
df2.set_index(pd.Index(index),'index',inplace=True)
plt.plot(df2['Date'],df2['Amount'])
plt.xticks(rotation=90)
plt.show()
print(df2)
输出:
Date Amount
1 2018-01-01 4890.846
2 2018-01-02 4890.846
3 2018-01-03 4890.846
4 2018-01-04 4890.846
5 2018-01-05 4890.846
6 2018-01-06 4890.846
7 2018-01-07 4890.846
8 2018-01-08 5337.086
9 2018-01-09 5337.086
10 2018-01-10 5337.086
11 2018-01-11 5337.086
12 2018-01-12 5337.086
13 2018-01-13 5337.086
14 2018-01-14 5337.086
15 2018-01-15 5559.445
16 2018-01-16 5559.445
17 2018-01-17 5559.445
18 2018-01-18 5559.445
19 2018-01-19 5559.445
20 2018-01-20 5559.445
21 2018-01-21 5559.445
22 2018-01-22 5271.980
23 2018-01-23 5271.980
24 2018-01-24 5271.980
25 2018-01-25 5271.980
26 2018-01-26 5271.980
27 2018-01-27 5271.980
28 2018-01-28 5271.980
- 转换为日期时间
- 使用
pd.date_range
创建包含“weekstart”和“weekend”之间所有日期的列
- 使用
explode
拆分成单独的行。
df["weekstart"] = pd.to_datetime(df["weekstart"])
df["weekend"] = pd.to_datetime(df["weekend"])
df["days"] = df.apply(lambda x: pd.date_range(x["weekstart"], x["weekend"], freq="D"), axis=1)
df = df.explode("days")
output = df[["days", "Daily"]]
>>> output
days Daily
0 2018-01-01 4890.846
0 2018-01-02 4890.846
0 2018-01-03 4890.846
0 2018-01-04 4890.846
0 2018-01-05 4890.846
0 2018-01-06 4890.846
0 2018-01-07 4890.846
1 2018-01-08 5337.086
1 2018-01-09 5337.086
1 2018-01-10 5337.086
1 2018-01-11 5337.086
1 2018-01-12 5337.086
1 2018-01-13 5337.086
1 2018-01-14 5337.086
2 2018-01-15 5559.445
2 2018-01-16 5559.445
2 2018-01-17 5559.445
2 2018-01-18 5559.445
2 2018-01-19 5559.445
2 2018-01-20 5559.445
2 2018-01-21 5559.445
3 2018-01-22 5271.980
3 2018-01-23 5271.980
3 2018-01-24 5271.980
3 2018-01-25 5271.980
3 2018-01-26 5271.980
3 2018-01-27 5271.980
3 2018-01-28 5271.980
我有一个如下所示的数据集:
df = pd.DataFrame({
'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
'Daily':[4890.846,5337.086,5559.445,5271.98],
})
我想扩展它以获取每日值并创建一个包含每日日期的新数据集以形成如下内容:
谢谢!!
这似乎是一个简单的问题。使用 df.iterrows() 迭代现有数据框,然后获取当前日期范围,然后使用此范围迭代日期范围,并使用这些日期用详细日期和金额填充另一个数据框。我迭代数据框,因为我将数据扩展到一个新的数据框,而不是使用数据框 explode
添加一个新列df = pd.DataFrame({
'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
'Daily':[4890.846,5337.086,5559.445,5271.98],
})
#for key, value in df.iterrows():
# fill a range with weekstart and weekend date ranges:
# iterate the range and populate the new data frame
df2=pd.DataFrame(columns=['Date','Amount'])
df2['Date']=pd.to_datetime(df2['Date'])
for key,value in df.iterrows():
amount=value['Daily']
for date in pd.date_range(value['weekstart'], value['weekend']):
df2=df2.append({"Date":date, "Amount":amount},ignore_index=True)
index=range(1,len(df2)+1)
df2.set_index(pd.Index(index),'index',inplace=True)
plt.plot(df2['Date'],df2['Amount'])
plt.xticks(rotation=90)
plt.show()
print(df2)
输出:
Date Amount
1 2018-01-01 4890.846
2 2018-01-02 4890.846
3 2018-01-03 4890.846
4 2018-01-04 4890.846
5 2018-01-05 4890.846
6 2018-01-06 4890.846
7 2018-01-07 4890.846
8 2018-01-08 5337.086
9 2018-01-09 5337.086
10 2018-01-10 5337.086
11 2018-01-11 5337.086
12 2018-01-12 5337.086
13 2018-01-13 5337.086
14 2018-01-14 5337.086
15 2018-01-15 5559.445
16 2018-01-16 5559.445
17 2018-01-17 5559.445
18 2018-01-18 5559.445
19 2018-01-19 5559.445
20 2018-01-20 5559.445
21 2018-01-21 5559.445
22 2018-01-22 5271.980
23 2018-01-23 5271.980
24 2018-01-24 5271.980
25 2018-01-25 5271.980
26 2018-01-26 5271.980
27 2018-01-27 5271.980
28 2018-01-28 5271.980
- 转换为日期时间
- 使用
pd.date_range
创建包含“weekstart”和“weekend”之间所有日期的列 - 使用
explode
拆分成单独的行。
df["weekstart"] = pd.to_datetime(df["weekstart"])
df["weekend"] = pd.to_datetime(df["weekend"])
df["days"] = df.apply(lambda x: pd.date_range(x["weekstart"], x["weekend"], freq="D"), axis=1)
df = df.explode("days")
output = df[["days", "Daily"]]
>>> output
days Daily
0 2018-01-01 4890.846
0 2018-01-02 4890.846
0 2018-01-03 4890.846
0 2018-01-04 4890.846
0 2018-01-05 4890.846
0 2018-01-06 4890.846
0 2018-01-07 4890.846
1 2018-01-08 5337.086
1 2018-01-09 5337.086
1 2018-01-10 5337.086
1 2018-01-11 5337.086
1 2018-01-12 5337.086
1 2018-01-13 5337.086
1 2018-01-14 5337.086
2 2018-01-15 5559.445
2 2018-01-16 5559.445
2 2018-01-17 5559.445
2 2018-01-18 5559.445
2 2018-01-19 5559.445
2 2018-01-20 5559.445
2 2018-01-21 5559.445
3 2018-01-22 5271.980
3 2018-01-23 5271.980
3 2018-01-24 5271.980
3 2018-01-25 5271.980
3 2018-01-26 5271.980
3 2018-01-27 5271.980
3 2018-01-28 5271.980