如何为第 "start_date" 列和 "end_date" 列的每一行创建 pandas.date_range()?
How to create pandas.date_range() for each row from column "start_date" and column "end_date"?
我有一个像这样的 df:
id | start_date | end_date | price
1 | 2020-10-01 | 2020-10-3 | 1
1 | 2020-10-03 | 2020-10-4 | 1
2 | 2020-10-04 | 2020-10-6 | 2
3 | 2020-10-05 | 2020-10-5 | 3
列“start_date”和“end_date”是 datetime64[ns]。
我想根据日期范围创建一个“日期”列。
最简单的方法是创建一个 pandas.date_range(start_date, end_date, freq="D"),然后使用 .explode()。
最终结果应如下所示:
id | start_date | end_date | price | date
1 | 2020-10-01 | 2020-10-3 | 1 | 2020-10-01
1 | 2020-10-01 | 2020-10-3 | 1 | 2020-10-02
1 | 2020-10-01 | 2020-10-3 | 1 | 2020-10-03
1 | 2020-10-03 | 2020-10-4 | 1 | 2020-10-03
1 | 2020-10-03 | 2020-10-4 | 1 | 2020-10-04
2 | 2020-10-04 | 2020-10-6 | 2 | 2020-10-04
2 | 2020-10-04 | 2020-10-6 | 2 | 2020-10-05
2 | 2020-10-04 | 2020-10-6 | 2 | 2020-10-06
3 | 2020-10-05 | 2020-10-5 | 3 | 2020-10-05
到目前为止尝试过:
df["daterange"] = pd.date_range(df["start_date"], df["end_date"])
TypeError: Cannot convert input [0 2020-10-01
1 2020-10-01
for row in df.itertuples():
df["daterange"] = pd.date_range(start=row.start_date, end=row.end_date)
ValueError: Length of values (3) does not match length of index (9)
Lambdas、apply、melt 等对于我的数据帧大小来说太慢了,无法使用!
/编辑
目前发现的Fastet方法:
https://github.com/Garve/scikit-bonus
skbonus.pandas.preprocessing.DateTimeExploder(
"date",
start_column="start_date",
end_column="end_date",
frequency="d",
drop=False,
)
df["daterange"] = df.apply(lambda x: pd.date_range(x.start_date, x.end_date), axis=1)
df = df.explode('daterange').reset_index(drop=True)
print (df)
id start_date end_date price daterange
0 1 2020-10-01 2020-10-3 1 2020-10-01
1 1 2020-10-01 2020-10-3 1 2020-10-02
2 1 2020-10-01 2020-10-3 1 2020-10-03
3 1 2020-10-03 2020-10-4 1 2020-10-03
4 1 2020-10-03 2020-10-4 1 2020-10-04
5 2 2020-10-04 2020-10-6 2 2020-10-04
6 2 2020-10-04 2020-10-6 2 2020-10-05
7 2 2020-10-04 2020-10-6 2 2020-10-06
8 3 2020-10-05 2020-10-5 3 2020-10-05
选择:
s = pd.concat([pd.Series(r.Index,pd.date_range(r.start_date, r.end_date)) for r in df.itertuples()])
s = pd.Series(s.index, s)
df = df.join(s.rename('daterange')).reset_index(drop=True)
print (df)
id start_date end_date price daterange
0 1 2020-10-01 2020-10-3 1 2020-10-01
1 1 2020-10-01 2020-10-3 1 2020-10-02
2 1 2020-10-01 2020-10-3 1 2020-10-03
3 1 2020-10-03 2020-10-4 1 2020-10-03
4 1 2020-10-03 2020-10-4 1 2020-10-04
5 2 2020-10-04 2020-10-6 2 2020-10-04
6 2 2020-10-04 2020-10-6 2 2020-10-05
7 2 2020-10-04 2020-10-6 2 2020-10-06
8 3 2020-10-05 2020-10-5 3 2020-10-05
目前发现的禁食方法:
https://github.com/Garve/scikit-bonus
from skbonus.pandas.preprocessing import DateTimeExploder
df = DateTimeExploder(
"date",
start_column="start_date",
end_column="end_date",
frequency="d",
drop=False,
)
我有一个像这样的 df:
id | start_date | end_date | price
1 | 2020-10-01 | 2020-10-3 | 1
1 | 2020-10-03 | 2020-10-4 | 1
2 | 2020-10-04 | 2020-10-6 | 2
3 | 2020-10-05 | 2020-10-5 | 3
列“start_date”和“end_date”是 datetime64[ns]。
我想根据日期范围创建一个“日期”列。
最简单的方法是创建一个 pandas.date_range(start_date, end_date, freq="D"),然后使用 .explode()。
最终结果应如下所示:
id | start_date | end_date | price | date
1 | 2020-10-01 | 2020-10-3 | 1 | 2020-10-01
1 | 2020-10-01 | 2020-10-3 | 1 | 2020-10-02
1 | 2020-10-01 | 2020-10-3 | 1 | 2020-10-03
1 | 2020-10-03 | 2020-10-4 | 1 | 2020-10-03
1 | 2020-10-03 | 2020-10-4 | 1 | 2020-10-04
2 | 2020-10-04 | 2020-10-6 | 2 | 2020-10-04
2 | 2020-10-04 | 2020-10-6 | 2 | 2020-10-05
2 | 2020-10-04 | 2020-10-6 | 2 | 2020-10-06
3 | 2020-10-05 | 2020-10-5 | 3 | 2020-10-05
到目前为止尝试过:
df["daterange"] = pd.date_range(df["start_date"], df["end_date"])
TypeError: Cannot convert input [0 2020-10-01 1 2020-10-01
for row in df.itertuples():
df["daterange"] = pd.date_range(start=row.start_date, end=row.end_date)
ValueError: Length of values (3) does not match length of index (9)
Lambdas、apply、melt 等对于我的数据帧大小来说太慢了,无法使用!
/编辑
目前发现的Fastet方法:
https://github.com/Garve/scikit-bonus
skbonus.pandas.preprocessing.DateTimeExploder(
"date",
start_column="start_date",
end_column="end_date",
frequency="d",
drop=False,
)
df["daterange"] = df.apply(lambda x: pd.date_range(x.start_date, x.end_date), axis=1)
df = df.explode('daterange').reset_index(drop=True)
print (df)
id start_date end_date price daterange
0 1 2020-10-01 2020-10-3 1 2020-10-01
1 1 2020-10-01 2020-10-3 1 2020-10-02
2 1 2020-10-01 2020-10-3 1 2020-10-03
3 1 2020-10-03 2020-10-4 1 2020-10-03
4 1 2020-10-03 2020-10-4 1 2020-10-04
5 2 2020-10-04 2020-10-6 2 2020-10-04
6 2 2020-10-04 2020-10-6 2 2020-10-05
7 2 2020-10-04 2020-10-6 2 2020-10-06
8 3 2020-10-05 2020-10-5 3 2020-10-05
选择:
s = pd.concat([pd.Series(r.Index,pd.date_range(r.start_date, r.end_date)) for r in df.itertuples()])
s = pd.Series(s.index, s)
df = df.join(s.rename('daterange')).reset_index(drop=True)
print (df)
id start_date end_date price daterange
0 1 2020-10-01 2020-10-3 1 2020-10-01
1 1 2020-10-01 2020-10-3 1 2020-10-02
2 1 2020-10-01 2020-10-3 1 2020-10-03
3 1 2020-10-03 2020-10-4 1 2020-10-03
4 1 2020-10-03 2020-10-4 1 2020-10-04
5 2 2020-10-04 2020-10-6 2 2020-10-04
6 2 2020-10-04 2020-10-6 2 2020-10-05
7 2 2020-10-04 2020-10-6 2 2020-10-06
8 3 2020-10-05 2020-10-5 3 2020-10-05
目前发现的禁食方法:
https://github.com/Garve/scikit-bonus
from skbonus.pandas.preprocessing import DateTimeExploder
df = DateTimeExploder(
"date",
start_column="start_date",
end_column="end_date",
frequency="d",
drop=False,
)