在运行时添加对 AWS Athena 中所有行相同的列的最快方法 / SQL

Question

我正在使用 aws athena 创建 table。创建此 table 时，我想将创建日期（如 2019-09-05）作为一列添加到 table。最快的方法是什么？

以下是一些可能的方式（注意：current_date是一个presto函数，更多细节在这里：https://prestodb.github.io/docs/current/functions/datetime.html）：

1. select [
     ...,
     current_date
   ]
   from a;

2. with variables as (select current_date as date_created)
   select [
     ...,
     variables.date_created
   ]
   from a, variables;

3. Using python to replace the expression
   select [
     ...,
     <REPLACE_ME>
   ]
   from a;

   # In python
   s = query.replace("<REPLACE_ME>", datetime.now())
   # run query in python

据我猜测，方法 3 是最快的，但是仅使用 sql 是否可行？方法 2 创建笛卡尔积，因此如果我们要添加多列并且方法 1 对每一行执行该函数，这可能会出现问题。

那么，最快最好的方法是什么？由于我使用的是基于 presto 的 athena，因此无法使用变量 afaik。谢谢

Answer 1

第一种方法最好：

SELECT <all other columns>, current_date
FROM ...

current_date会执行一次。它的值在查询计划期间被内联。任何其他确定性标量表达式都会发生同样的情况。

在运行时添加对 AWS Athena 中所有行相同的列的最快方法 / SQL

Fastest way to add column at runtime that is same for all rows in AWS Athena / SQL

sql

presto

amazon-athena