从 Python (pandas) 中的日期列获取周开始日期(星期一)?
Get week start date (Monday) from a date column in Python (pandas)?
我看过很多关于如何使用日期字符串执行此操作的帖子,但我正在为数据框列尝试一些操作,但到目前为止还没有成功。
我目前的方法是:从 'myday' 获取工作日,然后偏移获取星期一。
df['myday'] is column of dates.
mydays = pd.DatetimeIndex(df['myday']).weekday
df['week_start'] = pd.DatetimeIndex(df['myday']) - pd.DateOffset(days=mydays)
但是我明白了
TypeError:timedelta days 组件不支持的类型:numpy.ndarray
如何从 df 列中获取周开始日期?
它失败了,因为 pd.DateOffset 需要一个整数作为参数(而你正在为它提供一个数组)。您只能使用 DateOffset 将日期列更改为相同的偏移量。
试试这个:
import datetime as dt
# Change 'myday' to contains dates as datetime objects
df['myday'] = pd.to_datetime(df['myday'])
# 'daysoffset' will container the weekday, as integers
df['daysoffset'] = df['myday'].apply(lambda x: x.weekday())
# We apply, row by row (axis=1) a timedelta operation
df['week_start'] = df.apply(lambda x: x['myday'] - dt.TimeDelta(days=x['daysoffset']), axis=1)
我还没有实际测试过这段代码(没有示例数据),但这应该适用于您所描述的内容。
但是,您可能想看看 pandas.Resample,它可能会提供更好的解决方案 - 具体取决于您要查找的内容。
另一种选择:
df['week_start'] = df['myday'].dt.to_period('W').apply(lambda r: r.start_time)
这会将 'week_start' 设置为 'myday' 之前的第一个星期一。
同时 and solutions work I tend to try to stay away from using apply in Pandas because it is usually quite slow compared to array-based methods. In order to avoid this, after casting to a datetime column (via pd.to_datetime
) we can modify the weekday based method and simply cast the day of the week to be a numpy timedelta64[D] 通过直接转换它:
df['week_start'] = df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')
或使用 to_timedelta as :
df['week_start'] = df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D').
使用具有 60,000 个日期时间的测试数据,我使用新发布的 Pandas 1.0.1.
使用建议的答案得到以下时间
%timeit df.apply(lambda x: x['myday'] - datetime.timedelta(days=x['myday'].weekday()), axis=1)
>>> 1.33 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df['myday'].dt.to_period('W').apply(lambda r: r.start_time)
>>> 5.59 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')
>>> 3.44 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D')
>>> 3.47 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这些结果表明 Pandas 1.0.1 显着提高了基于 to_period apply 方法的速度(与 Pandas <= 0.25 相比),但表明直接转换为timedelta(通过直接转换类型 .astype('timedelta64[D]')
或使用 pd.to_timedelta
仍然更好。基于这些结果,我建议继续使用 pd.to_timedelta
。
(只是添加到 的回答)
使用 .astype('timedelta64[D]')
对我来说似乎不太可读——找到了一个仅使用 pandas 功能的替代方案:
df['myday'] - pd.to_timedelta(arg=df['myday'].dt.weekday, unit='D')
from datetime import datetime, timedelta
# Convert column to pandas datetime equivalent
df['myday'] = pd.to_datetime(df['myday'])
# Create function to calculate Start Week date
week_start_date = lambda date: date - timedelta(days=date.weekday())
# Apply above function on DataFrame column
df['week_start_date'] = df['myday'].apply(week_start_date)
我看过很多关于如何使用日期字符串执行此操作的帖子,但我正在为数据框列尝试一些操作,但到目前为止还没有成功。 我目前的方法是:从 'myday' 获取工作日,然后偏移获取星期一。
df['myday'] is column of dates.
mydays = pd.DatetimeIndex(df['myday']).weekday
df['week_start'] = pd.DatetimeIndex(df['myday']) - pd.DateOffset(days=mydays)
但是我明白了 TypeError:timedelta days 组件不支持的类型:numpy.ndarray
如何从 df 列中获取周开始日期?
它失败了,因为 pd.DateOffset 需要一个整数作为参数(而你正在为它提供一个数组)。您只能使用 DateOffset 将日期列更改为相同的偏移量。
试试这个:
import datetime as dt
# Change 'myday' to contains dates as datetime objects
df['myday'] = pd.to_datetime(df['myday'])
# 'daysoffset' will container the weekday, as integers
df['daysoffset'] = df['myday'].apply(lambda x: x.weekday())
# We apply, row by row (axis=1) a timedelta operation
df['week_start'] = df.apply(lambda x: x['myday'] - dt.TimeDelta(days=x['daysoffset']), axis=1)
我还没有实际测试过这段代码(没有示例数据),但这应该适用于您所描述的内容。
但是,您可能想看看 pandas.Resample,它可能会提供更好的解决方案 - 具体取决于您要查找的内容。
另一种选择:
df['week_start'] = df['myday'].dt.to_period('W').apply(lambda r: r.start_time)
这会将 'week_start' 设置为 'myday' 之前的第一个星期一。
同时 pd.to_datetime
) we can modify the weekday based method and simply cast the day of the week to be a numpy timedelta64[D] 通过直接转换它:
df['week_start'] = df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')
或使用 to_timedelta as
df['week_start'] = df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D').
使用具有 60,000 个日期时间的测试数据,我使用新发布的 Pandas 1.0.1.
使用建议的答案得到以下时间%timeit df.apply(lambda x: x['myday'] - datetime.timedelta(days=x['myday'].weekday()), axis=1)
>>> 1.33 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df['myday'].dt.to_period('W').apply(lambda r: r.start_time)
>>> 5.59 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')
>>> 3.44 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D')
>>> 3.47 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这些结果表明 Pandas 1.0.1 显着提高了基于 to_period apply 方法的速度(与 Pandas <= 0.25 相比),但表明直接转换为timedelta(通过直接转换类型 .astype('timedelta64[D]')
或使用 pd.to_timedelta
仍然更好。基于这些结果,我建议继续使用 pd.to_timedelta
。
(只是添加到
使用 .astype('timedelta64[D]')
对我来说似乎不太可读——找到了一个仅使用 pandas 功能的替代方案:
df['myday'] - pd.to_timedelta(arg=df['myday'].dt.weekday, unit='D')
from datetime import datetime, timedelta
# Convert column to pandas datetime equivalent
df['myday'] = pd.to_datetime(df['myday'])
# Create function to calculate Start Week date
week_start_date = lambda date: date - timedelta(days=date.weekday())
# Apply above function on DataFrame column
df['week_start_date'] = df['myday'].apply(week_start_date)