在 Snowflake 中将 any_value 与 max 和 window 函数一起使用时,按表达式错误分组无效

Invalid group by expression error when using any_value with max and window function in Snowflake

我收到了一个查询,我正在尝试修改它以获得每个 COMP_ID 的最新版本。原查询:

SELECT 
    ANY_VALUE(DATA_INDEX)::string AS DATA_INDEX, 
    COMP_ID::string AS COMP_ID, 
    ANY_VALUE(ACCOUNT_ID)::string AS ACCOUNT_ID, 
    ANY_VALUE(COMP_VERSION)::string AS COMP_VERSION, 
    ANY_VALUE(NAME)::string AS NAME, 
    ANY_VALUE(DESCRIPTION)::string AS DESCRIPTION,
    MAX(OBJECT_DICT:"startshape-type")[0]::string AS STARTSHAPE_TYPE,
    MAX(OBJECT_DICT:"startshape-connector-type")[0]::string AS STARTSHAPE_CONNECTOR_TYPE ,
    MAX(OBJECT_DICT:"startshape-action-type")[0]::string AS STATSHAPE_ACTION_TYPE,
    MAX(OBJECT_DICT:"overrides-enabled")[0]::string AS OVERRIDES_ENABLED
FROM COMP_DATA
GROUP BY COMP_ID
ORDER BY COMP_ID;

然后我尝试使用 window 函数只获取每个 comp_id 的最高版本。 这是修改后的查询:

SELECT 
    ANY_VALUE(DATA_INDEX)::string AS DATA_INDEX, 
    COMP_ID::string AS COMP_ID, 
    ANY_VALUE(ACCOUNT_ID)::string AS ACCOUNT_ID, 
    ANY_VALUE(COMP_VERSION)::string AS COMP_VERSION, 
    ANY_VALUE(NAME)::string AS NAME, 
    ANY_VALUE(DESCRIPTION)::string AS DESCRIPTION,
    MAX(OBJECT_DICT:"startshape-type")[0]::string AS STARTSHAPE_TYPE,
    MAX(OBJECT_DICT:"startshape-connector-type")[0]::string AS STARTSHAPE_CONNECTOR_TYPE ,
    MAX(OBJECT_DICT:"startshape-action-type")[0]::string AS STATSHAPE_ACTION_TYPE,
    MAX(OBJECT_DICT:"overrides-enabled")[0]::string AS OVERRIDES_ENABLED,
    ROW_NUMBER() OVER (PARTITION BY COMP_ID ORDER BY COMP_VERSION DESC) AS ROW_NUM
FROM COMP_DATA
QUALIFY 1 = ROW_NUM;

尝试编译时出现以下错误:

SQL compilation error: [COMP_DATA.COMP_ID] is not a valid group by expression

我原本以为问题出在 COMP_VERSION 上的 ANY_VALUE,但在删除 ANY_VALUE 后,出现了同样的错误。我发现没有出现错误的唯一方法是删除 4 个 MAX 字段和所有 ANY_VALUE(),如下所示:

SELECT 
    DATA_INDEX::string AS DATA_INDEX, 
    COMP_ID::string AS COMP_ID, 
    ACCOUNT_ID::string AS ACCOUNT_ID, 
    COMP_VERSION::string AS COMP_VERSION, 
    NAME::string AS NAME, 
    DESCRIPTION::string AS DESCRIPTION,
    ROW_NUMBER() OVER (PARTITION BY COMP_ID ORDER BY COMP_VERSION DESC) AS ROW_NUM
FROM COMP_DATA
QUALIFY 1 = ROW_NUM;

当然这还不够,因为我需要最多 4 个字段。

我还尝试使用最大字段创建 table,并使用 window 函数从新 table 到 select 最高 COMP_VERSION每个COMP_ID,但给出了相同的错误。

当您添加 QUALIFY 子句时,您从 SQL 中删除了 GROUP BY 子句,聚合函数如 MAX,需要所有选择都是聚合函数或具有 GROUP BY 子句。

因此,如果您只需要您注意到的每个分组子句的最佳行,则聚合函数需要显式窗口化。于是

SELECT 
    data_index::string AS data_index, 
    comp_id::string AS comp_id, 
    account_id::string AS account_id, 
    comp_version::string AS comp_version, 
    name::string AS name, 
    description::string AS description,
    MAX(object_dict:"startshape-type")OVER(PARTITION BY comp_id)[0]::string  AS startshape_type,
    MAX(object_dict:"startshape-connector-type")OVER (PARTITION BY comp_id)[0]::string AS startshape_connector_type ,
    MAX(object_dict:"startshape-action-type")OVER (PARTITION BY comp_id)[0]::string AS statshape_action_type,
    MAX(object_dict:"overrides-enabled")OVER(PARTITION BY comp_id)[0]::string AS overrides_enabled,
FROM COMP_DATA
QUALIFY 1 = ROW_NUMBER() OVER (PARTITION BY comp_id ORDER BY comp_version DESC);

您极有可能需要在那些 MAX 的周围添加一组括号

(MAX(object_dict:"overrides-enabled")OVER(PARTITION BY comp_id))[0]::string AS overrides_enabled,

但我怀疑开箱即用。而且我假设你不想要 row_number 所以将它推入限定(因为它永远是值 1)