使用 workalendar 生成一个包含多年假期的 DataFrame

generate a DataFrame of multiple years holidays with workalendar

import pandas as pd
from workalendar.core import Calendar
from workalendar.registry import registry

CalendarClass = registry.get('US')
calendar = CalendarClass()
calendar.holidays(2019)

#> [(datetime.date(2022, 1, 1), 'New year') ...]

你可以看到上面的输出,它输出了一个包含两个元素的列表。如何将其转换为两列的数据框,其中一列是日期列,另一列是字符串列?在我尝试过的其他事情中,我似乎无法 pd.DataFrame() 在列表中。

您可以简单地使用 DataFrame 构造函数:

df = pd.DataFrame(calendar.holidays(2019), columns=['date', 'name'])

输出:

         date                                 name
0  2019-01-01                             New year
1  2019-01-21  Birthday of Martin Luther King, Jr.
2  2019-02-18                Washington's Birthday
3  2019-05-27                         Memorial Day
4  2019-07-04                     Independence Day
5  2019-09-02                            Labor Day
6  2019-10-14                         Columbus Day
7  2019-11-11                         Veterans Day
8  2019-11-28                     Thanksgiving Day
9  2019-12-25                        Christmas Day

注意。如果你想要 pandas 日期时间类型,你可以使用 df['date'] = pd.to_datetime(df['date'])

转换日期

结合几年:

选项 1

lst = [calendar.holidays(year) for year in [2019, 2020]]
df = pd.concat([pd.DataFrame(l, columns=['date', 'name']) for l in lst],
               ignore_index=True)

          date                                 name
0   2019-01-01                             New year
1   2019-01-21  Birthday of Martin Luther King, Jr.
2   2019-02-18                Washington's Birthday
...
19  2020-11-26                     Thanksgiving Day
20  2020-12-25                        Christmas Day

选项 2:多索引

df = pd.concat({year: pd.DataFrame(calendar.holidays(year),
                                   columns=['date', 'name'])
                for year in [2019, 2020]})

               date                                 name
2019 0   2019-01-01                             New year
     1   2019-01-21  Birthday of Martin Luther King, Jr.
     2   2019-02-18                Washington's Birthday
     3   2019-05-27                         Memorial Day
     4   2019-07-04                     Independence Day
     5   2019-09-02                            Labor Day
     6   2019-10-14                         Columbus Day
     7   2019-11-11                         Veterans Day
     8   2019-11-28                     Thanksgiving Day
     9   2019-12-25                        Christmas Day
2020 0   2020-01-01                             New year
     1   2020-01-20  Birthday of Martin Luther King, Jr.
     2   2020-02-17                Washington's Birthday
     3   2020-05-25                         Memorial Day
     4   2020-07-03          Independence Day (Observed)
     5   2020-07-04                     Independence Day
     6   2020-09-07                            Labor Day
     7   2020-10-12                         Columbus Day
     8   2020-11-11                         Veterans Day
     9   2020-11-26                     Thanksgiving Day
     10  2020-12-25                        Christmas Day

你可以使用df的定义如下:

df = pd.DataFrame(calendar.holidays(2019), columns=['Date', 'Event'])