按日期过滤时,并非所有日期都被捕获。 Python Pandas

Not all dates are captured when filtering by dates. Python Pandas

我正在按日期过滤数据框以生成两个单独的版本:

  1. 只有今天的数据
  2. 过去两年的数据

但是,当我尝试过滤日期时,似乎漏掉了过去两年内的日期。

date_format = '%m-%d-%Y'  # desired date format

today = dt.now().strftime(date_format)  # today's date. Will always result in today's date
today = dt.strptime(today, date_format).date()  # converting 'today' into a datetime object

today = today.strftime(date_format)
two_years = today - relativedelta(years=2)  # date is today's date minus two years. 
two_years = two_years.strftime(date_format)

# normalizing the format of the date column to the desired format 
df_data['date'] = pd.to_datetime(df_data['date'], errors='coerce').dt.strftime(date_format)

df_today = df_data[df_data['date'] == today]
df_two_year = df_data[df_data['date'] >= two_years]

这导致:

all dates ['07-17-2020' '07-15-2020' '08-01-2019' '03-25-2015']
today df ['07-17-2020']
two year df ['07-17-2020' '08-01-2019']

尽管捕获了 08-01-2019,但两年中缺少 07-15-2020 日期。

您的数据类型转换是这里的问题所在。你可以这样做:

today = dt.now()  # today's date. Will always result in today's date
two_years = today - relativedelta(years=2)  # date is today's date minus two years. 

这将打印“2018-07-17 18:40:42.704395”。然后您可以将其转换为仅日期格式。

two_years = two_years.strftime(date_format)
two_years = dt.strptime(two_years, date_format).date()

您不需要将任何内容转换为字符串,只需使用 datetime dtype。例如:

import pandas as pd

df = pd.DataFrame({'date': pd.to_datetime(['07-17-2020','07-15-2020','08-01-2019','03-25-2015'])})

today = pd.Timestamp('now')

print(df[df['date'].dt.date == today.date()])
#         date
# 0 2020-07-17

print(df[(df['date'].dt.year >= today.year-1) & (df['date'].dt.date != today.date())])
#         date
# 1 2020-07-15
# 2 2019-08-01

您从比较操作中得到的(根据需要调整它们...)是布尔掩码 - 您可以很好地使用它们来过滤 df。