按日期过滤时,并非所有日期都被捕获。 Python Pandas
Not all dates are captured when filtering by dates. Python Pandas
我正在按日期过滤数据框以生成两个单独的版本:
- 只有今天的数据
- 过去两年的数据
但是,当我尝试过滤日期时,似乎漏掉了过去两年内的日期。
date_format = '%m-%d-%Y' # desired date format
today = dt.now().strftime(date_format) # today's date. Will always result in today's date
today = dt.strptime(today, date_format).date() # converting 'today' into a datetime object
today = today.strftime(date_format)
two_years = today - relativedelta(years=2) # date is today's date minus two years.
two_years = two_years.strftime(date_format)
# normalizing the format of the date column to the desired format
df_data['date'] = pd.to_datetime(df_data['date'], errors='coerce').dt.strftime(date_format)
df_today = df_data[df_data['date'] == today]
df_two_year = df_data[df_data['date'] >= two_years]
这导致:
all dates ['07-17-2020' '07-15-2020' '08-01-2019' '03-25-2015']
today df ['07-17-2020']
two year df ['07-17-2020' '08-01-2019']
尽管捕获了 08-01-2019,但两年中缺少 07-15-2020 日期。
您的数据类型转换是这里的问题所在。你可以这样做:
today = dt.now() # today's date. Will always result in today's date
two_years = today - relativedelta(years=2) # date is today's date minus two years.
这将打印“2018-07-17 18:40:42.704395”。然后您可以将其转换为仅日期格式。
two_years = two_years.strftime(date_format)
two_years = dt.strptime(two_years, date_format).date()
您不需要将任何内容转换为字符串,只需使用 datetime dtype。例如:
import pandas as pd
df = pd.DataFrame({'date': pd.to_datetime(['07-17-2020','07-15-2020','08-01-2019','03-25-2015'])})
today = pd.Timestamp('now')
print(df[df['date'].dt.date == today.date()])
# date
# 0 2020-07-17
print(df[(df['date'].dt.year >= today.year-1) & (df['date'].dt.date != today.date())])
# date
# 1 2020-07-15
# 2 2019-08-01
您从比较操作中得到的(根据需要调整它们...)是布尔掩码 - 您可以很好地使用它们来过滤 df。
我正在按日期过滤数据框以生成两个单独的版本:
- 只有今天的数据
- 过去两年的数据
但是,当我尝试过滤日期时,似乎漏掉了过去两年内的日期。
date_format = '%m-%d-%Y' # desired date format
today = dt.now().strftime(date_format) # today's date. Will always result in today's date
today = dt.strptime(today, date_format).date() # converting 'today' into a datetime object
today = today.strftime(date_format)
two_years = today - relativedelta(years=2) # date is today's date minus two years.
two_years = two_years.strftime(date_format)
# normalizing the format of the date column to the desired format
df_data['date'] = pd.to_datetime(df_data['date'], errors='coerce').dt.strftime(date_format)
df_today = df_data[df_data['date'] == today]
df_two_year = df_data[df_data['date'] >= two_years]
这导致:
all dates ['07-17-2020' '07-15-2020' '08-01-2019' '03-25-2015']
today df ['07-17-2020']
two year df ['07-17-2020' '08-01-2019']
尽管捕获了 08-01-2019,但两年中缺少 07-15-2020 日期。
您的数据类型转换是这里的问题所在。你可以这样做:
today = dt.now() # today's date. Will always result in today's date
two_years = today - relativedelta(years=2) # date is today's date minus two years.
这将打印“2018-07-17 18:40:42.704395”。然后您可以将其转换为仅日期格式。
two_years = two_years.strftime(date_format)
two_years = dt.strptime(two_years, date_format).date()
您不需要将任何内容转换为字符串,只需使用 datetime dtype。例如:
import pandas as pd
df = pd.DataFrame({'date': pd.to_datetime(['07-17-2020','07-15-2020','08-01-2019','03-25-2015'])})
today = pd.Timestamp('now')
print(df[df['date'].dt.date == today.date()])
# date
# 0 2020-07-17
print(df[(df['date'].dt.year >= today.year-1) & (df['date'].dt.date != today.date())])
# date
# 1 2020-07-15
# 2 2019-08-01
您从比较操作中得到的(根据需要调整它们...)是布尔掩码 - 您可以很好地使用它们来过滤 df。