BigQuery - select 序列中的最后一个非空值

BigQuery - select last non-null value in a sequence

我正在尝试恢复用户旅程中的最后一个非空值。

对于上下文,我正在查看用户在我的网站上的旅程中的每一个操作,最后一个非空值会为我提供用户将产品添加到购物车之前的最后一个产品列表名称。

在我的示例中,'ProductList' 列中的最后一个非空值将是 'EventAction' 列中 'Add to Cart' 行之前的 'Face Masks'

这是我在子查询中使用 first_value 之前的结果:

**fullVisitorId|visitStartTime|visitId|visitNumber|hitNumber|ProductList|ProductSKU|EventCat|EventAction**
12345XX | 1608202 | 56789 | 50 | 161 | Face Masks | ABC1401 | Ecommerce | Product Impression
12345XX | 1608202 | 56789 | 50 | 161 | Face Masks | ABC1501 | Ecommerce | Product Impression
12345XX | 1608202 | 56789 | 50 | 161 | Face Masks | ABC1601 | Ecommerce | Product Impression 
12345XX | 1608202 | 56789 | 50 | 161 | Face Masks | ABC1701 | Ecommerce | Product Impression
12345XX | 1608202 | 56789 | 50 | 162 | Face Masks | ABC1801 | Ecommerce | Product Click
12345XX | 1608202 | 56789 | 50 | 163 | NULL | ABC1801 | Ecommerce | Product View
12345XX | 1608202 | 56789 | 50 | 164 | NULL | ABC1801 | Ecommerce | Add to Cart

但是,在使用 'first_value' 函数后的代码中,我仍然得到空值。我哪里错了?有更简单的方法吗?

    SELECT
        FIRST_VALUE(ProductList IGNORE NULLS) 
OVER (PARTITION BY EventAction ORDER BY VisitNumber ASC, hitNumber ASC) AS NewProductList
        FROM(
        SELECT
          fullVisitorId,
          visitStartTime,
          visitId,
          visitNumber,
          hits.hitNumber AS hitNumber,
          CASE WHEN product.productListName = '(not set)' THEN NULL ELSE product.productListName END AS ProductList,
          product.productSKU AS ProductSKU,
          hits.eventInfo.eventCategory AS EventCategory,
          hits.eventInfo.eventAction AS EventAction,
        FROM
          `tablename.ga_sessions_20200914`,
          UNNEST(hits) AS hits,
          UNNEST(hits.product) product
        WHERE
          geoNetwork.country = 'United Kingdom'
          AND fullVisitorId = '1000104589833493743'
        ORDER BY
          1,
          4 ASC,
          5 ASC
        LIMIT 10000) WHERE EventAction = 'Add to Cart'

如果您想要 last 值,则需要降序排序或 last_value()。事实上,对于您想要的,first_value() 是更简单的解决方案:

    FIRST_VALUE(ProductList IGNORE NULLS) OVER
        (PARTITION BY VisitNumber ORDER BY hitNumber DESC) AS NewProductList

我从表达式中删除了 eventaction。好像跟你要的没关系

我发现您的查询没有什么问题

  1. 您按 EventAction 进行分区 - 这是不正确的 - 您应该使用一些真正呈现您要在其中识别的组的列 last non-null value - 这样您就可以使用 fullVisitorId例子
  2. 您的 WHERE EventAction = 'Add to Cart' 子句被应用 之前应用 LAST_VALUE 因此 NULL

所以,记住以上几点(应用正确的分区并将 WHERE 子句移到外面)- 下面应该可以工作(BigQuery 标准 SQL)

#standardSQL
SELECT NewProductList
FROM (
  SELECT *, 
    LAST_VALUE(ProductList IGNORE NULLS) 
      OVER (PARTITION BY fullVisitorId ORDER BY VisitNumber, hitNumber) AS NewProductList
  FROM (
        SELECT
          fullVisitorId,
          visitStartTime,
          visitId,
          visitNumber,
          hits.hitNumber AS hitNumber,
          CASE WHEN product.productListName = '(not set)' THEN NULL ELSE product.productListName END AS ProductList,
          product.productSKU AS ProductSKU,
          hits.eventInfo.eventCategory AS EventCategory,
          hits.eventInfo.eventAction AS EventAction,
        FROM
          `tablename.ga_sessions_20200914`,
          UNNEST(hits) AS hits,
          UNNEST(hits.product) product
        WHERE
          geoNetwork.country = 'United Kingdom'
          AND fullVisitorId = '1000104589833493743'
        ORDER BY
          1,
          4 ASC,
          5 ASC
        LIMIT 10000
  )
)
WHERE EventAction = 'Add to Cart'