将查询从 spark.sql 转换为 impala
Convert a query from spark.sql to impala
我在 pyspark 中有以下查询:
spark.sql= ("select id, track_id, data_source
from db.races
where dt_date = 20201010")
.groupBy("id", "track_id", "data_source")
.agg(cnt('*').alias("num_races"))
.withColumn('last_num_id', col('id').substr(-1,1))
.withColumn('last_num_track_id', col('track_id').substr(-1,1))
.withColumn("status_date", lit(previous_date))
我想将其转换为 impala 查询。
到目前为止我的尝试:
select id, track_id, data_source
from db.races
group by id, track_id, data_source
...
直到groupBy
的部分我都能理解,但之后我就无法确切地理解这些pyspark函数是如何转换的。
不熟悉 Impala,但这是我编写 SQL 查询的尝试:
select
t.*,
substr(t.id, -1, 1) as last_num_id,
substr(t.track_id, -1, 1) as last_num_track_id,
'(put the previous_date here)' as status_date
from (
select id, track_id, data_source, count(*) as num_races
from db.races
where dt_date = 20201010
group by id, track_id, data_source
) as t
我在 pyspark 中有以下查询:
spark.sql= ("select id, track_id, data_source
from db.races
where dt_date = 20201010")
.groupBy("id", "track_id", "data_source")
.agg(cnt('*').alias("num_races"))
.withColumn('last_num_id', col('id').substr(-1,1))
.withColumn('last_num_track_id', col('track_id').substr(-1,1))
.withColumn("status_date", lit(previous_date))
我想将其转换为 impala 查询。
到目前为止我的尝试:
select id, track_id, data_source
from db.races
group by id, track_id, data_source
...
直到groupBy
的部分我都能理解,但之后我就无法确切地理解这些pyspark函数是如何转换的。
不熟悉 Impala,但这是我编写 SQL 查询的尝试:
select
t.*,
substr(t.id, -1, 1) as last_num_id,
substr(t.track_id, -1, 1) as last_num_track_id,
'(put the previous_date here)' as status_date
from (
select id, track_id, data_source, count(*) as num_races
from db.races
where dt_date = 20201010
group by id, track_id, data_source
) as t