如何仅在特定日期范围内 return 值?
How to return values only within a specific date range?
我有一个程序可以抓取 API 并从字段中获取所需的值。有一个字段叫published_date一行为json对象。我只想发布从当前日期起最近 2 个月的值。
try:
price = str(price).replace(',', '')
price = Decimal(price)
if date < end:
if not math.isnan(price):
report_item = PriceItem(
source=SOURCE,
source_url=crawled_url,
original_index_id=original_index_id,
index_specification=index_specification,
published_date=date,
price=price.quantize(Decimal('1.00'))
)
yield report_item
except DecimalException as ex:
self.logger.error(f"Non decimal price of {price} "
f"found in {original_index_id}", exc_info=ex)
发表日期提取:
for report_date in REPORT_DATE_TYPES:
if report_date in result:
date = result[report_date].split(' ')[0]
date = datetime.strptime(date, '%m/%d/%Y')
MAX_REPORT_MONTHS = 3
current_date = datetime.now()
current_date_str = current_date.strftime('%m/%d/%Y')
start = datetime.strptime(current_date_str, '%m/%d/%Y')
last_date = current_date - relativedelta(months=MAX_REPORT_MONTHS)
last_date_str = last_date.strftime('%m/%d/%Y')
end = datetime.strptime(last_date_str, '%m/%d/%Y')
上面我说的是最后日期字符串和当前日期字符串。
api 的摘录:
将数据收集到数据框中后,您可以将包含日期的列转换为日期时间,然后通过比较运算符只保留所需的数据。
例如,假设这是您的数据:
data = {'date': ['02/02/2022 10:23:23', '09/23/2021 10:23:23', '02/01/2021 10:23:23', '12/15/2021 10:23:23'], 'random': [324, 231, 213, 123]}
df = pd.DataFrame(data)
# convert date column to datetime
df['date'] = pd.to_datetime(df['date'], format="%m/%d/%Y %H:%M:%S")
# select "threshold" date, two months before current one
current_date = datetime.now()
last_date = current_date - relativedelta(months=2)
# select data published after last_date
df[df['date'] > last_date]
如果我们考虑今天的日期,我们将得到这个结果。
之前:
date random
0 02/02/2022 10:23:23 324
1 09/23/2021 10:23:23 231
2 02/01/2021 10:23:23 213
3 12/15/2021 10:23:23 123
之后:
date random
0 2022-02-02 10:23:23 324
3 2021-12-15 10:23:23 123
我有一个程序可以抓取 API 并从字段中获取所需的值。有一个字段叫published_date一行为json对象。我只想发布从当前日期起最近 2 个月的值。
try:
price = str(price).replace(',', '')
price = Decimal(price)
if date < end:
if not math.isnan(price):
report_item = PriceItem(
source=SOURCE,
source_url=crawled_url,
original_index_id=original_index_id,
index_specification=index_specification,
published_date=date,
price=price.quantize(Decimal('1.00'))
)
yield report_item
except DecimalException as ex:
self.logger.error(f"Non decimal price of {price} "
f"found in {original_index_id}", exc_info=ex)
发表日期提取:
for report_date in REPORT_DATE_TYPES:
if report_date in result:
date = result[report_date].split(' ')[0]
date = datetime.strptime(date, '%m/%d/%Y')
MAX_REPORT_MONTHS = 3
current_date = datetime.now()
current_date_str = current_date.strftime('%m/%d/%Y')
start = datetime.strptime(current_date_str, '%m/%d/%Y')
last_date = current_date - relativedelta(months=MAX_REPORT_MONTHS)
last_date_str = last_date.strftime('%m/%d/%Y')
end = datetime.strptime(last_date_str, '%m/%d/%Y')
上面我说的是最后日期字符串和当前日期字符串。
api 的摘录:
将数据收集到数据框中后,您可以将包含日期的列转换为日期时间,然后通过比较运算符只保留所需的数据。
例如,假设这是您的数据:
data = {'date': ['02/02/2022 10:23:23', '09/23/2021 10:23:23', '02/01/2021 10:23:23', '12/15/2021 10:23:23'], 'random': [324, 231, 213, 123]}
df = pd.DataFrame(data)
# convert date column to datetime
df['date'] = pd.to_datetime(df['date'], format="%m/%d/%Y %H:%M:%S")
# select "threshold" date, two months before current one
current_date = datetime.now()
last_date = current_date - relativedelta(months=2)
# select data published after last_date
df[df['date'] > last_date]
如果我们考虑今天的日期,我们将得到这个结果。
之前:
date random
0 02/02/2022 10:23:23 324
1 09/23/2021 10:23:23 231
2 02/01/2021 10:23:23 213
3 12/15/2021 10:23:23 123
之后:
date random
0 2022-02-02 10:23:23 324
3 2021-12-15 10:23:23 123