BigQuery - 计算列名称与正则表达式模式匹配的列中的非空值

Question

刚接触 BQ，请耐心等待。

我有一个包含各种列的 table（在示例中，为简洁起见，它们是 col1 到 col4）并且我有一个正则表达式来确定要将哪些列名组合在一起（示例 - ac_v\d+_final_p\w+）。我想要做的是确定非空值在特定行的列分组中出现的次数。通过研究，我能够制作附加查询，但是，这显然只是 returns 整个 table 的计数，而不是根据需要与定义的正则表达式关联的行。

简化数据结构：

key	col1	col2	col3	col4	lol1
2	0.0025	null	null	null	null
3	0.0015	null	0.0005	null	null
1	null	null	null	0.000	0.3

期望的结果： 我只想计算那些带有 col 名称前缀的列。

key	count_non_nulls	count_nulls
1	1	3
2	1	3
3	2	2

有没有办法在 BQ 标准中实现这一点 SQL？

感谢您期待您的帮助。

BEGIN
#standardSQL
CREATE TEMP TABLE `mytable` AS (
  SELECT 1 AS key, null AS col1, null AS col2, null AS col3, 0.0001 AS col4 UNION ALL
  SELECT 2, 0.0025, null, null, null UNION ALL
  SELECT 3, 0.0015,  null, 0.0005, null
)
;

SELECT 
  COUNTIF(value not in  ('null', '')) AS count_non_nulls, 
  COUNTIF(value in  ('null', '')) AS count_nulls, 
  COUNT(value) count_all
FROM `mytable` t, 
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value
;

END

Answer 1

考虑以下方法

select key, 
  (
    select as struct 
      countif(column_value != 'null') as count_non_nulls,
      countif(column_value = 'null') as count_nulls
    from unnest(split(translate(to_json_string(t), '{}"', ''))) kv,
    unnest([struct(split(kv, ':')[offset(0)] as column_name, split(kv, ':')[offset(1)] as column_value)])
    where column_name != 'key'
    and starts_with(column_name, 'col')
  ).*
from `project.dataset.table` t

如果应用于您问题中的示例数据 - 输出为

注意：如果您需要使用您拥有的任何正则表达式 - 您可以使用它而不是下面的行

starts_with(column_name, 'col')

BigQuery - 计算列名称与正则表达式模式匹配的列中的非空值

BigQuery - Count non-nulls across columns where the column name matches regex patterns

regex

re2

google-bigquery