BQSQL基于方差比较行的解决方案

BQ SQL solution solution for comparing rows based on variance

我正在尝试比较 BigQuery 中抓取的零售商品价格数据(约 2-3B 行,具体取决于时间段和包括的零售商);目的是识别有意义的价格差异。例如,1.99 美元与 2.00 美元没有意义,但 1.99 美元与 2.50 美元是有意义的。有意义被量化为价格之间 2% 的差异。

一项的示例数据集如下所示:

ITEM       Price($)  Meaningful (This is the column I'm trying to flag) 
Apple      .99     Y (lowest price would always be flagged)
Apple      .00     N (.99 v .00)
Apple      .01     N (.99 v .01)  Still using .99 for comparison
Apple      .50     Y (.99 v .50)  Still using .99 for comparison
Apple      .56     Y (.50 v .56)  Now using .50 as new comp. price
Apple      .62     Y (.55 v .62)  Now using .56 as new comp. price

我希望通过使用 SQL Window 函数(超前、滞后、分区等)将当前行的价格与下一行的价格进行比较来解决问题。但是,当我得到一个无意义的价格时,它不能正常工作,因为我总是希望将下一个值与最近的有意义的价格进行比较(参见上面的 2.50 美元行示例,与前一行中的 2.00 美元相比,而不是 2.01 美元)

我的问题:

以下适用于 BigQuery 标准 SQL

#standardSQL
CREATE TEMPORARY FUNCTION x(prices ARRAY<FLOAT64>)
RETURNS ARRAY<STRUCT<price FLOAT64, flag STRING>>
LANGUAGE js AS """
  var result = [];
  var last = 0;
  var flag = '';
  for (i = 0; i < prices.length; i++){
    if (i == 0) {
      last = prices[i];
      flag = 'Y'
    } else {
      if ((prices[i] - last)/last > 0.02) {
        last = prices[i];
        flag = 'Y'
      } else {flag = 'N'}
    }
    var rec = [];
    rec.price = prices[i];
    rec.flag = flag;
    result.push(rec); 
  } 
  return result;
""";
SELECT item, rec.* 
FROM (
  SELECT item, ARRAY_AGG(price ORDER BY price) AS prices
  FROM `yourTable`
  GROUP BY item
), UNNEST(x(prices) ) AS rec
-- ORDER BY item, price  

您可以使用您问题中的以下虚拟数据来玩/测试它

#standardSQL
CREATE TEMPORARY FUNCTION x(prices ARRAY<FLOAT64>)
RETURNS ARRAY<STRUCT<price FLOAT64, flag STRING>>
LANGUAGE js AS """
  var result = [];
  var last = 0;
  var flag = '';
  for (i = 0; i < prices.length; i++){
    if (i == 0) {
      last = prices[i];
      flag = 'Y'
    } else {
      if ((prices[i] - last)/last > 0.02) {
        last = prices[i];
        flag = 'Y'
      } else {flag = 'N'}
    }
    var rec = [];
    rec.price = prices[i];
    rec.flag = flag;
    result.push(rec); 
  } 
  return result;
""";
WITH `yourTable` AS (
  SELECT 'Apple' AS item, 1.99 AS price UNION ALL
  SELECT 'Apple', 2.00 UNION ALL
  SELECT 'Apple', 2.01 UNION ALL
  SELECT 'Apple', 2.50 UNION ALL
  SELECT 'Apple', 2.56 UNION ALL
  SELECT 'Apple', 2.62 
)
SELECT item, rec.* 
FROM (
  SELECT item, ARRAY_AGG(price ORDER BY price) AS prices
  FROM `yourTable`
  GROUP BY item
), UNNEST(x(prices) ) AS rec
ORDER BY item, price    

结果如下

item    price   flag     
----    -----   ----
Apple   1.99    Y    
Apple   2.0     N    
Apple   2.01    N    
Apple   2.5     Y    
Apple   2.56    Y    
Apple   2.62    Y