只允许作为顶级连词的 SubQuery 表达式

Only SubQuery expressions that are top level conjuncts are allowed

我想更改现有的查询并收到以下错误:

Unsupported SubQuery Expression 'deleted': Only SubQuery expressions that are top level conjuncts are allowed

现有查询为:

SELECT DISTINCT
    *
FROM
    geoposition_import AS geo
-- do not take into account data for deleted users
WHERE 
    EXISTS (
        SELECT 1 
        FROM geoposition_import_users AS u 
        WHERE u.id = geo.userId 
            AND NOT u.deleted 
    );

经过我们的改变,geoposition_import中的userId可以为空,因为现在地理位置也可以由机器创建。所以我将查询更改为

SELECT DISTINCT
    *
FROM
    geoposition_import AS geo
-- do not take into account data for deleted users
WHERE 
    geo.userId IS NULL -- data from non users (e.g. machines) is still fine
    OR
    EXISTS (
        SELECT 1 
        FROM geoposition_import_users AS u 
        WHERE u.id = geo.userId 
            AND NOT u.deleted 
    );

出现上述错误

我用谷歌搜索并找到限制:https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/using-hiveql/content/hive_hive_subquery_limitations.html

所以我的猜测是:OR 是问题所在。

现在我的问题:

  1. 为什么错误消息指出 'delete' 是问题所在?
  2. 我如何重写查询才能使其正常工作?

我想到的唯一解决方案是:将条件拆分为单独的视图,然后执行 UNION ALL

喜欢:

CREATE VIEW IF NOT EXISTS geoposition_import_from_non_users AS
SELECT DISTINCT
    *
FROM
    geoposition_import AS geo
WHERE 
    geo.userId IS NULL;

CREATE VIEW IF NOT EXISTS geoposition_import_from_users AS
SELECT DISTINCT
    *
FROM
    geoposition_import AS geo
-- do not take into account data for deleted users
WHERE 
    EXISTS (
        SELECT 1 
        FROM geoposition_import_users AS u 
        WHERE u.id = geo.userId 
            AND NOT u.deleted 
    );

-- staged data with possible duplicates removed
CREATE VIEW IF NOT EXISTS geoposition_import_distinct AS
SELECT * FROM geoposition_import_from_non_users
UNION ALL
SELECT * FROM geoposition_import_from_users;

有意见吗?

尝试使用 LEFT JOIN 而不是 EXISTS:

 SELECT DISTINCT
    geo.*
FROM geoposition_import geo
     LEFT JOIN geoposition_import_users u ON u.id=geo.userId AND NOT u.deleted     
WHERE 
    geo.userId IS NULL -- data from non users (e.g. machines) is still fine
    OR u.id IS NOT NULL;