根据配置单元 SQL 中的 select 查询设置条件值
set if condition value based on a select query in hive SQL
我想根据 IF
条件设置一个新列,其中值在 select 查询中。
例如,
SELECT
request_id,
charge_click_cnt,
IF(
uuid IN (
SELECT
deviceid
from
t1
where
dt between '20210908'
and '20210915'
),
'shop_user',
'non_shop_user'
) as shop_user
FROM
t2
但是好像有如下错误。
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class#failAnalysis:41 org.apache.spark.sql.catalyst.analysis.Analyzer#failAnalysis:91 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis#apply:316
所以我想知道是否有更好的方法来设置条件值,就像之前的代码一样。
提前致谢。
如果 spark 允许带有 exists 子句的 case 语句,则以下应该有效。
SELECT request_id,
charge_click_cnt,
CASE WHEN EXISTS(SELECT 1
FROM t1
WHERE dt between '20210908'AND '20210915'
AND uuid=deviceid) THEN 'shop_user'
ELSE 'non_shop_user'
END
FROM t2;
如果你想尝试 spark dataFrame api ,也许你可以试试这个:
import org.apache.spark.sql.functions._
val df1 = spark.sql("select deviceid as uuid, 1 as tag from t1 where dt between '20210908' and '20210915'")
val df2 = spark.sql("select requst_id, charge_click_cnt, uuid from t2")
val resultDf = df2.join(df1, Seq("uuid"), "left").withColumn("IF", when(col("a") === 1, "shop_user").otherwise("non_shop_user")).drop("tag")
我想根据 IF
条件设置一个新列,其中值在 select 查询中。
例如,
SELECT
request_id,
charge_click_cnt,
IF(
uuid IN (
SELECT
deviceid
from
t1
where
dt between '20210908'
and '20210915'
),
'shop_user',
'non_shop_user'
) as shop_user
FROM
t2
但是好像有如下错误。
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class#failAnalysis:41 org.apache.spark.sql.catalyst.analysis.Analyzer#failAnalysis:91 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis#apply:316
所以我想知道是否有更好的方法来设置条件值,就像之前的代码一样。
提前致谢。
如果 spark 允许带有 exists 子句的 case 语句,则以下应该有效。
SELECT request_id,
charge_click_cnt,
CASE WHEN EXISTS(SELECT 1
FROM t1
WHERE dt between '20210908'AND '20210915'
AND uuid=deviceid) THEN 'shop_user'
ELSE 'non_shop_user'
END
FROM t2;
如果你想尝试 spark dataFrame api ,也许你可以试试这个:
import org.apache.spark.sql.functions._
val df1 = spark.sql("select deviceid as uuid, 1 as tag from t1 where dt between '20210908' and '20210915'")
val df2 = spark.sql("select requst_id, charge_click_cnt, uuid from t2")
val resultDf = df2.join(df1, Seq("uuid"), "left").withColumn("IF", when(col("a") === 1, "shop_user").otherwise("non_shop_user")).drop("tag")