如何将 Pandas 周转换为用户支持的业务逻辑?

How to Convert Pandas Week to User Supported Business Logic?

目标是根据以下用户分别为 2021 年和 2022 年定义的日历将 pandas 时间戳对象转换为一年中的第几周。

我正在使用 pandas 日期时间的第 属性 周,它适用于 2021 年的日期,但它们在明年会出现故障。这是我写的初始函数。

def week(date: pandas.Timestamp) -> int:
    """Convert the date to a week according to client calendar."""
    orig_week: int = date.week % 53
    return orig_week + 1 if date.dayofweek < 5 else orig_week + 2

我添加了模数 53,因为如果没有它,有时其余的逻辑会给我类似 54 之类的数字。但我不确定 pandas 周 属性 的内部逻辑,因此无法真正掌握如何将那一周转换为上述用户定义的日历,尽管这听起来像是一个简单的转变。问题出在边缘情况下(年底或年初)。因此,我们将不胜感激。

一种解决方案是为输入年份创建自定义日历。一年中的日期将转换为表示从星期六开始到星期五结束的星期的期间。

import pandas as pd

def week(in_date: pd.Timestamp) -> int:
    """Convert the date to a week according to client calendar."""
    # Setup a custom calendar table for the year of input date
    in_date_year = str(in_date.year)
    _df = pd.DataFrame({'Date':pd.date_range(in_date_year+'-1-1', in_date_year+'-12-31')})
    _df['Period'] = _df['Date'].dt.to_period('W-FRI')   # define week period that starts on SAT and ends on FRI 
    _df['Week_Num'] = _df['Period'].dt.week

    # Adjust week number for year start 
    _df['Week_Num'] = np.where(_df['Week_Num'].iloc[0] >= 52, _df['Week_Num'] % 53 + 1, _df['Week_Num'])
    # Adjust week number for year end
    _df.iloc[-7:, _df.columns.get_loc('Week_Num')] = np.where(_df['Week_Num'].iloc[-7:] < 52, 53, _df['Week_Num'].iloc[-7:])

    # Get week number and return
    return _df.loc[_df['Date'] == in_date, 'Week_Num'].iat[0]

为 Period 返回的周数将根据从 SAT 开始到 FRI 结束的周来设置。但是,对于年初和年终,周数可能仍然显示对应于 previous/next 年的周数。因此,我们相应地检查并调整了今年的 start/end 差异。

结果:

week(pd.Timestamp('2022-01-01'))
#output
1

week(pd.Timestamp('2022-12-31'))
#output
53

基础table构建如下:

2022 年:

# print first 10 rows of the year
print(_df.head(10))

        Date                 Period  Week_Num
0 2022-01-01  2022-01-01/2022-01-07         1
1 2022-01-02  2022-01-01/2022-01-07         1
2 2022-01-03  2022-01-01/2022-01-07         1
3 2022-01-04  2022-01-01/2022-01-07         1
4 2022-01-05  2022-01-01/2022-01-07         1
5 2022-01-06  2022-01-01/2022-01-07         1
6 2022-01-07  2022-01-01/2022-01-07         1
7 2022-01-08  2022-01-08/2022-01-14         2
8 2022-01-09  2022-01-08/2022-01-14         2
9 2022-01-10  2022-01-08/2022-01-14         2


# print last 10 rows of the year
print(_df.tail(10))

          Date                 Period  Week_Num
355 2022-12-22  2022-12-17/2022-12-23        51
356 2022-12-23  2022-12-17/2022-12-23        51
357 2022-12-24  2022-12-24/2022-12-30        52
358 2022-12-25  2022-12-24/2022-12-30        52
359 2022-12-26  2022-12-24/2022-12-30        52
360 2022-12-27  2022-12-24/2022-12-30        52
361 2022-12-28  2022-12-24/2022-12-30        52
362 2022-12-29  2022-12-24/2022-12-30        52
363 2022-12-30  2022-12-24/2022-12-30        52
364 2022-12-31  2022-12-31/2023-01-06        53

2025 年:

# print first 10 rows of the year
print(_df.head(10))

        Date                 Period  Week_Num
0 2025-01-01  2024-12-28/2025-01-03         1
1 2025-01-02  2024-12-28/2025-01-03         1
2 2025-01-03  2024-12-28/2025-01-03         1
3 2025-01-04  2025-01-04/2025-01-10         2
4 2025-01-05  2025-01-04/2025-01-10         2
5 2025-01-06  2025-01-04/2025-01-10         2
6 2025-01-07  2025-01-04/2025-01-10         2
7 2025-01-08  2025-01-04/2025-01-10         2
8 2025-01-09  2025-01-04/2025-01-10         2
9 2025-01-10  2025-01-04/2025-01-10         2

# print last 10 rows of the year
print(_df.tail(10))

          Date                 Period  Week_Num
355 2025-12-22  2025-12-20/2025-12-26        52
356 2025-12-23  2025-12-20/2025-12-26        52
357 2025-12-24  2025-12-20/2025-12-26        52
358 2025-12-25  2025-12-20/2025-12-26        52
359 2025-12-26  2025-12-20/2025-12-26        52
360 2025-12-27  2025-12-27/2026-01-02        53
361 2025-12-28  2025-12-27/2026-01-02        53
362 2025-12-29  2025-12-27/2026-01-02        53
363 2025-12-30  2025-12-27/2026-01-02        53
364 2025-12-31  2025-12-27/2026-01-02        53