如何在 Impala 查询中使用 python 变量来使用纪元时间查找前一天?

How to use a python variable in an Impala query to find previous day using epoch time?

我的目标是使用 unix 时间戳字段仅查询 Impala 昨天的数据。我不想对日期进行硬编码,因为我希望这个脚本每天 运行 并且只查询前一天。我正在使用 python 并为开始时间和结束时间创建了字符串。

结束时间存储为 bigint,如下所示:1561996779000

yesterday = dt.date.fromordinal(dt.date.today().toordinal()-1).strftime("%F")
yesterday_start = yesterday + ' 00:00:00'
yesterday_end = yesterday + ' 23:59:59'

yesterday_start
'2019-07-28 00:00:00'
yesterday_end
'2019-07-28 23:59:59'

我尝试了以下方法,但其中 none 似乎有效:

cursor.execute('select sourceaddress, sourcehostname, sourceusername, endtime from proxy where endtime between unix_timestamp("+yesterday_start+") and unix_timestamp("+yesterday_end+")')
cursor.execute("select sourceaddress, sourcehostname, sourceusername, endtime from proxy where endtime between unix_timestamp("+yesterday_start+") and unix_timestamp("+yesterday_end+")")
cursor.execute("select sourceaddress, sourcehostname, sourceusername, endtime from proxy where endtime between unix_timestamp('yesterday_start') and unix_timestamp('yesterday_end')")
cursor.execute("SELECT * from proxy where endtime between unix_timestamp('"+yesterday_start+"') and unix_timestamp('"+yesterday_end+"')")

这是 Impala 文档中的示例:

select unix_timestamp('2015-05-15 12:00:00');
+---------------------------------------+
| unix_timestamp('2015-05-15 12:00:00') |
+---------------------------------------+
| 1431691200                            |
+---------------------------------------+

仍在寻找更好的方法来完成此任务。这虽然有效。

#Date pattern
date_pattern = '%Y-%m-%d %H:%M:%S'
#Yesterday system date
yesterday = dt.date.fromordinal(dt.date.today().toordinal()-1).strftime("%F")
#Start datetime
yesterday_start = yesterday + ' 00:00:00'
yesterday_start_epoch = int(time.mktime(time.strptime(yesterday_start, date_pattern)))
yesterday_start_epoch_str = str(yesterday_start_epoch)
#End datetime 
yesterday_end = yesterday + ' 23:59:59'
yesterday_end_epoch = int(time.mktime(time.strptime(yesterday_end, date_pattern)))
yesterday_end_epoch_str = str(yesterday_end_epoch)

#Start timer
start_time = timeit.default_timer()
#Connection and query
IMPALA_HOST = os.getenv('HOST', 'server')
conn = connect(host=HOST, port=port, auth_mechanism='', use_ssl=True)
cursor = conn.cursor()
cursor.execute('SHOW TABLES')
tables = as_pandas(cursor)
cursor.execute("select sourceaddress, sourcehostname, sourceusername, endtime from proxy where endtime between cast('"+yesterday_start_epoch_str+"' AS INT) and cast('"+yesterday_end_epoch_str+"' AS INT)")
df = as_pandas(cursor)
#End timer
end_time = timeit.default_timer()
#Print time it took
print("Elapsed time: {}".format(end_time - start_time))