使用开始日期和结束日期创建一个新列:Pandas
Using Start Date and End Date create a a new column : Pandas
我有一个如下所示的 pandas 数据框(示例),我想创建一个带有额外列 'NewDate' 的新 table,它将查看 StartDate 并显示最后日期每个 ID 的开始日期和随后的每个月的最后一个日期,直到结束日期,如果我的 ID 的结束日期为 Null,则该系列将在当月的最后一个日期即 2022 年 5 月停止。
ID StartDate EndDate
100 1/01/2022 26/04/2022
101 20/04/2022 Null
102 1/01/2022 27/02/2022
....
我的预期输出:
ID StartDate EndDate NewDate
100 1/01/2022 26/04/2022 31/01/2022
100 1/01/2022 26/04/2022 28/02/2022
100 1/01/2022 26/04/2022 31/03/2022
100 1/01/2022 26/04/2022 30/04/2022
101 20/04/2022 Null 30/04/2022
101 20/04/2022 Null 31/05/2022
102 1/01/2022 27/02/2022 31/01/2022
102 1/01/2022 27/02/2022 28/02/2022
...
试试这个
# convert each date column to datetime
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# create date ranges for each row
f = lambda s,e: pd.date_range(s, e+pd.DateOffset(months=1), freq='M')
df['NewDate'] = [f(s,e) if e==e else f(s,pd.datetime.now()) for s, e in zip(df['StartDate'], df['EndDate'])]
# explode the new column
df = df.explode('NewDate')
print(df)
ID StartDate EndDate NewDate
0 100 2022-01-01 2022-04-26 2022-01-31
0 100 2022-01-01 2022-04-26 2022-02-28
0 100 2022-01-01 2022-04-26 2022-03-31
0 100 2022-01-01 2022-04-26 2022-04-30
1 101 2022-04-20 NaT 2022-04-30
1 101 2022-04-20 NaT 2022-05-31
2 102 2022-01-01 2022-02-27 2022-01-31
2 102 2022-01-01 2022-02-27 2022-02-28
您首先需要 trim 您的日期,然后我们使用 pd.date_range
和 explode
列创建日期范围
s1 = pd.to_datetime(df.StartDate, format = '%d/%m/%Y')
s2 = pd.to_datetime(df.EndDate, format = '%d/%m/%Y', errors = 'coerce') + pd.offsets.MonthEnd(0)
s2 = s2.fillna(s1 + pd.offsets.MonthEnd(2))
df['new'] = [pd.date_range(x, y , freq= 'M',closed = 'left') for x , y in zip(df.StartDate, s+pd.offsets.MonthEnd(1))]
out = df.explode('new')
out
Out[206]:
ID StartDate EndDate new
0 100 1/01/2022 26/04/2022 2022-01-31
0 100 1/01/2022 26/04/2022 2022-02-28
0 100 1/01/2022 26/04/2022 2022-03-31
0 100 1/01/2022 26/04/2022 2022-04-30
1 101 20/04/2022 Null 2022-04-30
1 101 20/04/2022 Null 2022-05-31
2 102 1/01/2022 27/02/2022 2022-01-31
2 102 1/01/2022 27/02/2022 2022-02-28
已更新
s2 = s2.fillna(s1 + pd.offsets.MonthEnd(1))
df['new_date'] = [pd.date_range(x, y , freq= 'M',closed = 'left') for x , y in zip(df.start_date, s2+pd.offsets.MonthEnd(1))]
output = df.explode('new_date')
我有一个如下所示的 pandas 数据框(示例),我想创建一个带有额外列 'NewDate' 的新 table,它将查看 StartDate 并显示最后日期每个 ID 的开始日期和随后的每个月的最后一个日期,直到结束日期,如果我的 ID 的结束日期为 Null,则该系列将在当月的最后一个日期即 2022 年 5 月停止。
ID StartDate EndDate
100 1/01/2022 26/04/2022
101 20/04/2022 Null
102 1/01/2022 27/02/2022
....
我的预期输出:
ID StartDate EndDate NewDate
100 1/01/2022 26/04/2022 31/01/2022
100 1/01/2022 26/04/2022 28/02/2022
100 1/01/2022 26/04/2022 31/03/2022
100 1/01/2022 26/04/2022 30/04/2022
101 20/04/2022 Null 30/04/2022
101 20/04/2022 Null 31/05/2022
102 1/01/2022 27/02/2022 31/01/2022
102 1/01/2022 27/02/2022 28/02/2022
...
试试这个
# convert each date column to datetime
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# create date ranges for each row
f = lambda s,e: pd.date_range(s, e+pd.DateOffset(months=1), freq='M')
df['NewDate'] = [f(s,e) if e==e else f(s,pd.datetime.now()) for s, e in zip(df['StartDate'], df['EndDate'])]
# explode the new column
df = df.explode('NewDate')
print(df)
ID StartDate EndDate NewDate
0 100 2022-01-01 2022-04-26 2022-01-31
0 100 2022-01-01 2022-04-26 2022-02-28
0 100 2022-01-01 2022-04-26 2022-03-31
0 100 2022-01-01 2022-04-26 2022-04-30
1 101 2022-04-20 NaT 2022-04-30
1 101 2022-04-20 NaT 2022-05-31
2 102 2022-01-01 2022-02-27 2022-01-31
2 102 2022-01-01 2022-02-27 2022-02-28
您首先需要 trim 您的日期,然后我们使用 pd.date_range
和 explode
列创建日期范围
s1 = pd.to_datetime(df.StartDate, format = '%d/%m/%Y')
s2 = pd.to_datetime(df.EndDate, format = '%d/%m/%Y', errors = 'coerce') + pd.offsets.MonthEnd(0)
s2 = s2.fillna(s1 + pd.offsets.MonthEnd(2))
df['new'] = [pd.date_range(x, y , freq= 'M',closed = 'left') for x , y in zip(df.StartDate, s+pd.offsets.MonthEnd(1))]
out = df.explode('new')
out
Out[206]:
ID StartDate EndDate new
0 100 1/01/2022 26/04/2022 2022-01-31
0 100 1/01/2022 26/04/2022 2022-02-28
0 100 1/01/2022 26/04/2022 2022-03-31
0 100 1/01/2022 26/04/2022 2022-04-30
1 101 20/04/2022 Null 2022-04-30
1 101 20/04/2022 Null 2022-05-31
2 102 1/01/2022 27/02/2022 2022-01-31
2 102 1/01/2022 27/02/2022 2022-02-28
已更新
s2 = s2.fillna(s1 + pd.offsets.MonthEnd(1))
df['new_date'] = [pd.date_range(x, y , freq= 'M',closed = 'left') for x , y in zip(df.start_date, s2+pd.offsets.MonthEnd(1))]
output = df.explode('new_date')