是否有可能获得每个项目的明确成本摘要?

Is it possible to get an unambiguous cost summary per project?

我们已将账单历史记录导出到 bigquery。我正在尝试获取每个项目的总成本,但我开始认为这样做是不可能的,因为数据包括 project.labels,这意味着每个计费项目可以有多行。

这是我刚刚 运行:

的一个查询
SELECT   project.labels.key,project.labels.value,service.description,usage_start_time,usage_end_time,project.id,sku.description,cost 
FROM     [our-billing-export] 
WHERE    service.id = "6F81-5844-456A"
  and    usage_start_time = "2018-04-06 19:25:01.510 UTC"
  and    usage_end_time = "2018-04-06 21:25:03.785 UTC"
  and    project.id = "dh-raia"
  and    sku.id = "D973-5D65-BAB2"
order by project.labels.key,project.labels.value,service.id, usage_start_time,usage_end_time,project.id,sku.id,cost

其中 returns 这个:

请注意,"Storage PD Capacity" 有 3 个相同的成本,我认为这没问题,它们可能代表 3 个不同的永久性磁盘。另请注意,尽管再次出现相同的 3 个成本,但这次是不同的 project.labels.key.

我的目标是获得每个 project.id 的总成本。显然我不能只发出:

select project.id,sum(cost)
from [our-billing-export]
group by project.id

因为某些成本会被多次包含(因为它们出现多次 project.labels.key)。

我无法过滤单个 project.labels.key,因为我们不保证每个项目都具有相同的标签。

我无法尝试消除像这样的标签造成的重复:

SELECT   service.id,sku.id,usage_start_time,usage_end_time, project.id,cost
FROM     [our-billing-export]
GROUP BY  service.id,sku.id ,usage_start_time,usage_end_time,project.id,cost

因为这会排除三个成本相同的有效订单项。

我不能像这样使用 OVER() 子句为每个客户获取单个标签:

SELECT   project.labels.key,service.id,usage_start_time,usage_end_time,project.id,sku.id,rownum
FROM     (
         SELECT   project.labels.key,service.id,usage_start_time,usage_end_time,project.id,sku.id,
                  ROW_NUMBER() OVER (PARTITION BY project.id,service.id,usage_start_time,usage_end_time,sku.id,project.labels.key) as rownum
         FROM     [our-billing-export]
         )q
WHERE rownum=1

因为当我这样做时我得到错误 Repeated field 'project.labels.key' as PARTITION BY key is not allowed.

所以,据我所知,没有办法得到这个问题的明确答案"How much have I spent on each project?"我希望有人能告诉我我错了,并且有办法完成这个。

好的,我已经设法解决了这个问题(在同事的帮助下)

SELECT service.description
,      sku.description
,      project.name
,      labels
,      cost
FROM (
    SELECT  service.description
    ,       sku.description
    ,       project.name
    ,       group_concat(project.labels.key + ':' + project.labels.value) WITHIN RECORD AS labels
    ,       cost
    FROM [our-billing-export] 
    WHERE usage_start_time = "2018-04-06 19:25:01.510 UTC" 
      AND usage_end_time = "2018-04-06 21:25:03.785 UTC" 
      AND project.id = 'dh-raia' AND cost > 0
      AND sku.id = "D973-5D65-BAB2"
  )

Returns 正确的费用

然后可以聚合。

BigQuery 文档 here and here 应该会有用。

具体来说,重复字段导致重复计数的方式是,如果您将该字段展平,从而使其他行重复。从第二个 link:

"Given a record with one or more values for a repeated field, FLATTEN will create multiple records, one for each value in the repeated field. All other fields selected from the record are duplicated in each new output record."

除非您将重复字段展平,否则您的简单示例查询 (select project.id,sum(cost) from [our-billing-export] group by project.id select) 不会为您造成重复计算问题。

顺便说一句,除了使用 Legacy SQL 和 GROUP_CONCAT ... WITHIN RECORD 来获取作为连接字符串返回的重复字段之外,您还可以使用 TO_JSON_STRING 标准运算符 SQL。请参阅示例 here

希望对您有所帮助!