Python Pandas - 查找两个日期之间的事件数(1 个数据框和 1 个列表)

Python Pandas - Finding the number of events between two dates (1 data frame and 1 list)

我有一个 csv 文件,每行有 2 个日期时间(开始前和结束前),以及一个日期时间列表 (install_list)。

我正在尝试遍历 csv 文件并添加一列 returns install_list 开始前时间和结束前时间之间的日期总数在每一行中。

我正在使用下面的代码,但它会为 csv 中的每一行返回列表中的项目总数。

示例:文件 1 = 开始时间、结束时间 列表 1 = 安装时间

每行的期望结果 = 如果安装时间 >= 开始时间和安装时间 <= 结束时间,SUM(安装)

Col1(开始时间):1/1/21 12:00:00 下午

第 2 列(结束时间):1/1/21 12:10:00 下午

安装时间列表 = [1/1/21 12:05:00 下午,1/1/21 12:11:00 下午]

Row1/Col3 = 1 的期望结果

代码如下:

import datetime
import pandas as pd
from collections import Counter

df_post_logs = pd.read_csv('logs_merged.csv',index_col=False)
df_installs = pd.read_csv('install_merge.csv',index_col=False)

'''Convert UTC to EST on Installs Add Column'''

df_installs['conversion date'] = pd.to_datetime(df_installs['conversion date'],infer_datetime_format='%Y-%m-%d')
df_installs['conversion time'] = pd.to_datetime(df_installs['conversion time'],infer_datetime_format='%H:%S:%M')

utc_datetime = df_installs['conversion time']
est_datetime = utc_datetime - datetime.timedelta(hours=5)


df_installs['utc datetime'] = utc_datetime
df_installs['est datetime'] = est_datetime

'''Add Column 10 Minutes Pre-Spot Time to Post Logs/10 Minutes Post Time to Spot'''

df_post_logs['Air Date'] = pd.to_datetime(df_post_logs['Air Date'],infer_datetime_format='%Y-%m-%d')
df_post_logs['Air Time'] = pd.to_datetime(df_post_logs['Air Time'],infer_datetime_format='%H:%S:%M')

timestamp = df_post_logs['Air Time']

df_post_logs['timestamp'] = timestamp
df_post_logs['pre spot time start'] = timestamp - datetime.timedelta(minutes=10, seconds=1)
df_post_logs['pre spot time end'] = timestamp - datetime.timedelta(seconds=1)
df_post_logs['post spot time'] = timestamp + datetime.timedelta(minutes=10)

'''SUM of Installs between pre-spot time'''

install_list = pd.to_datetime(df_installs['est datetime']).to_list()

for pre_spot_start in df_post_logs['pre spot time start']:
    pre_spot_start_time = pre_spot_start

for pre_spot_end in df_post_logs['pre spot time end']:
    pre_spot_end_time = pre_spot_end

for pre_spot_end in df_post_logs['pre spot time end']:
    pre_spot_end_time = pre_spot_end

pre_spot_install = 0

for row in df_post_logs:
    for date in install_list:
        if date >= pre_spot_start_time and date <= pre_spot_end_time:
            pre_spot_install = pre_spot_install+1

df_post_logs['Pre Spot Install'] = pre_spot_install

df_post_logs.to_csv('Test.csv')

以下代码将为每一行打印 install_dates 中有多少个值介于数据帧的 startend 列中的相应值之间:

import pandas as pd

df = pd.DataFrame({
  "start": pd.to_datetime(["2018-07-11", "2018-06-10"]),
  "end": pd.to_datetime(["2018-07-20", "2018-06-30"]),
})

install_dates = pd.to_datetime(["2018-06-25", "2018-07-01", "2018-07-15", "2018-07-18"])

def num_install_dates_between_start_and_end(row):
    return len([d for d in install_dates if row["start"] <= d <= row["end"]])
    
print(df.agg(num_install_dates_between_start_and_end, axis="columns"))

它使用agg将一行信息折叠成一个数字。 信息“折叠”的方式在函数 num_install_dates_between_start_and_end 中指定,它计算 install_dates 中有多少元素在行中的 start/end 值之间。