Pandas:每年生成两个日期
Pandas: Generate Two Dates Per Year
这里是新手。我正在尝试制定一项政策,每年创建两个记录,围绕他们的周年纪念日分开。我的输入将是每项政策的单行,并将包括发布日期和结束日期。我想扩展它,为每项政策创建两个记录,围绕他们的周年日分开。我的输出应该是这样的:
policy_key
issue_date
end_date
period_start
period_end
12345
2005-03-15
2020-10-18
2005-03-15
2005-12-31
12345
2005-03-15
2020-10-18
2006-01-01
2006-03-14
12345
2005-03-15
2020-10-18
2006-03-15
2006-12-31
...
...
...
...
...
12345
2005-03-15
2020-10-18
2020-01-01
2020-03-14
12345
2005-03-15
2020-10-18
2020-03-15
2020-10-18
这是我到目前为止尝试过的方法,但我不知道如何将 period_start 日期更新为从年初或周年纪念日开始,以及 period_end 在周年纪念日的前一天或一年的最后一天结束的日期。
# importing libraries
import pandas as pd
import datetime as dt
import numpy as np
import math as math
from datetime import date, timedelta
# arbitrary policy inputs
policy_key = '12345'
issue_date = '2005-03-15'
issue_age = 35
duration = 10
end_date = '2020-10-18'
freq = pd.DateOffset(months=6)
# creating dataframe
df = pd.DataFrame({'policy_key':policy_key,
'issue_date':issue_date,
'issue_age':issue_age,
'duration':duration,
'attained_age':issue_age + duration - 1,
'period_start':pd.date_range(start=issue_date,end=end_date, freq=freq)})
df['issue_date'] = pd.to_datetime(df['issue_date'], format='%Y-%m-%d')```
有点复杂,但是很管用!
# importing libraries
import pandas as pd
import datetime as dt
import numpy as np
import math as math
from datetime import date, timedelta
# arbitrary policy inputs
policy_key = '12345'
issue_date = '2005-03-15'
issue_age = 35
duration = 10
end_date = '2020-10-18'
freq = pd.DateOffset(months=6)
将日期设置为 datetime
:
issue_date = pd.to_datetime(issue_date, format="%Y-%m-%d")
end_date = pd.to_datetime(end_date, format="%Y-%m-%d")
为每个日期范围创建 3 个列表:
开始日期:
- 给定的
issue_date
- 每年年初的日期列表,偏移
issue_date.month-1
和 issue_date.day-1
- 每年年底的日期列表
与结束日期相似,但相差 issue_date.day-2
。
# start dates
start_dates = [issue_date] + \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearBegin()) +\
pd.offsets.DateOffset(months=issue_date.month-1,
days=issue_date.day-1)) + \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearBegin()))
# end dates
end_dates = [end_date] + \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearEnd())) + \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearBegin()) +\
pd.offsets.DateOffset(months=issue_date.month-1,
days=issue_date.day-2))
对值进行排序,使它们按顺序排列:
# sort values
start_dates.sort()
end_dates.sort()
将列添加到 DataFrame:
df = pd.DataFrame({'policy_key': policy_key,
'issue_date': issue_date,
'end_date': end_date,
'issue_age': issue_age,
'duration': duration,
'attained_age': issue_age + duration - 1,
'period_start': start_dates,
'period_end': end_dates})
# policy_key issue_date end_date issue_age duration attained_age period_start period_end
# 0 12345 2005-03-15 2020-10-18 35 10 44 2005-03-15 2005-12-31
# 1 12345 2005-03-15 2020-10-18 35 10 44 2006-01-01 2006-03-14
# 2 12345 2005-03-15 2020-10-18 35 10 44 2006-03-15 2006-12-31
# 3 12345 2005-03-15 2020-10-18 35 10 44 2007-01-01 2007-03-14
# 4 12345 2005-03-15 2020-10-18 35 10 44 2007-03-15 2007-12-31
# 5 12345 2005-03-15 2020-10-18 35 10 44 2008-01-01 2008-03-14
# 6 12345 2005-03-15 2020-10-18 35 10 44 2008-03-15 2008-12-31
# 7 12345 2005-03-15 2020-10-18 35 10 44 2009-01-01 2009-03-14
# 8 12345 2005-03-15 2020-10-18 35 10 44 2009-03-15 2009-12-31
# 9 12345 2005-03-15 2020-10-18 35 10 44 2010-01-01 2010-03-14
# 10 12345 2005-03-15 2020-10-18 35 10 44 2010-03-15 2010-12-31
# 11 12345 2005-03-15 2020-10-18 35 10 44 2011-01-01 2011-03-14
# 12 12345 2005-03-15 2020-10-18 35 10 44 2011-03-15 2011-12-31
# 13 12345 2005-03-15 2020-10-18 35 10 44 2012-01-01 2012-03-14
# 14 12345 2005-03-15 2020-10-18 35 10 44 2012-03-15 2012-12-31
# 15 12345 2005-03-15 2020-10-18 35 10 44 2013-01-01 2013-03-14
# 16 12345 2005-03-15 2020-10-18 35 10 44 2013-03-15 2013-12-31
# 17 12345 2005-03-15 2020-10-18 35 10 44 2014-01-01 2014-03-14
# 18 12345 2005-03-15 2020-10-18 35 10 44 2014-03-15 2014-12-31
# 19 12345 2005-03-15 2020-10-18 35 10 44 2015-01-01 2015-03-14
# 20 12345 2005-03-15 2020-10-18 35 10 44 2015-03-15 2015-12-31
# 21 12345 2005-03-15 2020-10-18 35 10 44 2016-01-01 2016-03-14
# 22 12345 2005-03-15 2020-10-18 35 10 44 2016-03-15 2016-12-31
# 23 12345 2005-03-15 2020-10-18 35 10 44 2017-01-01 2017-03-14
# 24 12345 2005-03-15 2020-10-18 35 10 44 2017-03-15 2017-12-31
# 25 12345 2005-03-15 2020-10-18 35 10 44 2018-01-01 2018-03-14
# 26 12345 2005-03-15 2020-10-18 35 10 44 2018-03-15 2018-12-31
# 27 12345 2005-03-15 2020-10-18 35 10 44 2019-01-01 2019-03-14
# 28 12345 2005-03-15 2020-10-18 35 10 44 2019-03-15 2019-12-31
# 29 12345 2005-03-15 2020-10-18 35 10 44 2020-01-01 2020-03-14
# 30 12345 2005-03-15 2020-10-18 35 10 44 2020-03-15 2020-10-18
这里是新手。我正在尝试制定一项政策,每年创建两个记录,围绕他们的周年纪念日分开。我的输入将是每项政策的单行,并将包括发布日期和结束日期。我想扩展它,为每项政策创建两个记录,围绕他们的周年日分开。我的输出应该是这样的:
policy_key | issue_date | end_date | period_start | period_end |
---|---|---|---|---|
12345 | 2005-03-15 | 2020-10-18 | 2005-03-15 | 2005-12-31 |
12345 | 2005-03-15 | 2020-10-18 | 2006-01-01 | 2006-03-14 |
12345 | 2005-03-15 | 2020-10-18 | 2006-03-15 | 2006-12-31 |
... | ... | ... | ... | ... |
12345 | 2005-03-15 | 2020-10-18 | 2020-01-01 | 2020-03-14 |
12345 | 2005-03-15 | 2020-10-18 | 2020-03-15 | 2020-10-18 |
这是我到目前为止尝试过的方法,但我不知道如何将 period_start 日期更新为从年初或周年纪念日开始,以及 period_end 在周年纪念日的前一天或一年的最后一天结束的日期。
# importing libraries
import pandas as pd
import datetime as dt
import numpy as np
import math as math
from datetime import date, timedelta
# arbitrary policy inputs
policy_key = '12345'
issue_date = '2005-03-15'
issue_age = 35
duration = 10
end_date = '2020-10-18'
freq = pd.DateOffset(months=6)
# creating dataframe
df = pd.DataFrame({'policy_key':policy_key,
'issue_date':issue_date,
'issue_age':issue_age,
'duration':duration,
'attained_age':issue_age + duration - 1,
'period_start':pd.date_range(start=issue_date,end=end_date, freq=freq)})
df['issue_date'] = pd.to_datetime(df['issue_date'], format='%Y-%m-%d')```
有点复杂,但是很管用!
# importing libraries
import pandas as pd
import datetime as dt
import numpy as np
import math as math
from datetime import date, timedelta
# arbitrary policy inputs
policy_key = '12345'
issue_date = '2005-03-15'
issue_age = 35
duration = 10
end_date = '2020-10-18'
freq = pd.DateOffset(months=6)
将日期设置为 datetime
:
issue_date = pd.to_datetime(issue_date, format="%Y-%m-%d")
end_date = pd.to_datetime(end_date, format="%Y-%m-%d")
为每个日期范围创建 3 个列表:
开始日期:
- 给定的
issue_date
- 每年年初的日期列表,偏移
issue_date.month-1
和issue_date.day-1
- 每年年底的日期列表
与结束日期相似,但相差 issue_date.day-2
。
# start dates
start_dates = [issue_date] + \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearBegin()) +\
pd.offsets.DateOffset(months=issue_date.month-1,
days=issue_date.day-1)) + \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearBegin()))
# end dates
end_dates = [end_date] + \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearEnd())) + \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearBegin()) +\
pd.offsets.DateOffset(months=issue_date.month-1,
days=issue_date.day-2))
对值进行排序,使它们按顺序排列:
# sort values
start_dates.sort()
end_dates.sort()
将列添加到 DataFrame:
df = pd.DataFrame({'policy_key': policy_key,
'issue_date': issue_date,
'end_date': end_date,
'issue_age': issue_age,
'duration': duration,
'attained_age': issue_age + duration - 1,
'period_start': start_dates,
'period_end': end_dates})
# policy_key issue_date end_date issue_age duration attained_age period_start period_end
# 0 12345 2005-03-15 2020-10-18 35 10 44 2005-03-15 2005-12-31
# 1 12345 2005-03-15 2020-10-18 35 10 44 2006-01-01 2006-03-14
# 2 12345 2005-03-15 2020-10-18 35 10 44 2006-03-15 2006-12-31
# 3 12345 2005-03-15 2020-10-18 35 10 44 2007-01-01 2007-03-14
# 4 12345 2005-03-15 2020-10-18 35 10 44 2007-03-15 2007-12-31
# 5 12345 2005-03-15 2020-10-18 35 10 44 2008-01-01 2008-03-14
# 6 12345 2005-03-15 2020-10-18 35 10 44 2008-03-15 2008-12-31
# 7 12345 2005-03-15 2020-10-18 35 10 44 2009-01-01 2009-03-14
# 8 12345 2005-03-15 2020-10-18 35 10 44 2009-03-15 2009-12-31
# 9 12345 2005-03-15 2020-10-18 35 10 44 2010-01-01 2010-03-14
# 10 12345 2005-03-15 2020-10-18 35 10 44 2010-03-15 2010-12-31
# 11 12345 2005-03-15 2020-10-18 35 10 44 2011-01-01 2011-03-14
# 12 12345 2005-03-15 2020-10-18 35 10 44 2011-03-15 2011-12-31
# 13 12345 2005-03-15 2020-10-18 35 10 44 2012-01-01 2012-03-14
# 14 12345 2005-03-15 2020-10-18 35 10 44 2012-03-15 2012-12-31
# 15 12345 2005-03-15 2020-10-18 35 10 44 2013-01-01 2013-03-14
# 16 12345 2005-03-15 2020-10-18 35 10 44 2013-03-15 2013-12-31
# 17 12345 2005-03-15 2020-10-18 35 10 44 2014-01-01 2014-03-14
# 18 12345 2005-03-15 2020-10-18 35 10 44 2014-03-15 2014-12-31
# 19 12345 2005-03-15 2020-10-18 35 10 44 2015-01-01 2015-03-14
# 20 12345 2005-03-15 2020-10-18 35 10 44 2015-03-15 2015-12-31
# 21 12345 2005-03-15 2020-10-18 35 10 44 2016-01-01 2016-03-14
# 22 12345 2005-03-15 2020-10-18 35 10 44 2016-03-15 2016-12-31
# 23 12345 2005-03-15 2020-10-18 35 10 44 2017-01-01 2017-03-14
# 24 12345 2005-03-15 2020-10-18 35 10 44 2017-03-15 2017-12-31
# 25 12345 2005-03-15 2020-10-18 35 10 44 2018-01-01 2018-03-14
# 26 12345 2005-03-15 2020-10-18 35 10 44 2018-03-15 2018-12-31
# 27 12345 2005-03-15 2020-10-18 35 10 44 2019-01-01 2019-03-14
# 28 12345 2005-03-15 2020-10-18 35 10 44 2019-03-15 2019-12-31
# 29 12345 2005-03-15 2020-10-18 35 10 44 2020-01-01 2020-03-14
# 30 12345 2005-03-15 2020-10-18 35 10 44 2020-03-15 2020-10-18