根据配置单元 SQL 中的 select 查询设置条件值

set if condition value based on a select query in hive SQL

我想根据 IF 条件设置一个新列,其中值在 select 查询中。 例如,

SELECT
    request_id,
    charge_click_cnt,
    IF(
        uuid IN (
            SELECT
                deviceid
            from
                t1
            where
                dt between '20210908'
                and '20210915'
        ),
        'shop_user',
        'non_shop_user'
    ) as shop_user
FROM
    t2

但是好像有如下错误。

org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class#failAnalysis:41 org.apache.spark.sql.catalyst.analysis.Analyzer#failAnalysis:91 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis#apply:316

所以我想知道是否有更好的方法来设置条件值,就像之前的代码一样。
提前致谢。

如果 spark 允许带有 exists 子句的 case 语句,则以下应该有效。

SELECT request_id,
       charge_click_cnt, 
       CASE WHEN EXISTS(SELECT 1
                          FROM t1
                         WHERE dt between '20210908'AND '20210915'
                           AND uuid=deviceid) THEN 'shop_user' 
            ELSE 'non_shop_user' 
        END 
  FROM t2;

如果你想尝试 spark dataFrame api ,也许你可以试试这个:

import org.apache.spark.sql.functions._

val df1 = spark.sql("select deviceid as uuid, 1 as tag from t1 where dt between '20210908' and '20210915'")
val df2 = spark.sql("select requst_id, charge_click_cnt, uuid from t2")

val resultDf = df2.join(df1, Seq("uuid"), "left").withColumn("IF", when(col("a") === 1, "shop_user").otherwise("non_shop_user")).drop("tag")