Google BigQuery 重复字段
Google BigQuery Repeated field
我目前有一个 table 看起来像这样:
Id // Key
General
platformId
platformName
Products [
Repeated Product {
Country
URL
Offers [
Repeated Offer
Type
Price
Currency
]
}
]
我需要将其转换为不同的格式:
Record ID // Key
Country
Providers [
Repeated provider
platformName
Offers [
Repeated Offer
Type
Price
Currency
]
]
我最初将 table 展平并得到这样的结果:
id,platformId,platformName,products.product.country,products.product.offers.offer.price,products.product.offers.offer.type,products.product.offers.offer.currency
1,123,AWS,US,1.99,CPU,USD
1,123,AWS,US,1.99,HDD,USD
1,123,AWS,US,1.99,RAM,USD
2,123,AWS,CA,2.99,CPU,CAN
2,123,AWS,CA,2.99,HDD,CAN
2,123,AWS,CA,2.99,RAM,CAN
3,123,GOOG,US,3.99,CPU,GBP
3,123,GOOG,US,3.99,HDD,GBP
3,123,GOOG,US,3.99,RAM,GBP
我想按国家和平台名称对以下字段进行分组:
1,123,AWS,US,1.99,CPU,USD
1,123,AWS,US,1.99,HDD,USD
1,123,AWS,US,1.99,RAM,USD
3,123,GOOG,US,1.99,CPU,GBP
3,123,GOOG,US,1.99,HDD,GBP
3,123,GOOG,US,1.99,RAM,GBP
字段结构应如下所示:
123,US,AWS
CPU,1.99,USD
HDD,1.99,USD
RAM,1.99,USD
GOOG
CPU,3.99,USD
HDD,3.99,USD
RAM,3.99,USD
有什么指点吗?
目前我无法按国家/地区分组:
+---------+---------------+--------+--------+----------+
| country | platformName | type | price | currency |
+---------+---------------+--------+--------+----------+
| US | AWS | CPU | 1.99 | USD |
| | | HDD | 1.99 | USD |
| | | RAM | 1.99 | USD |
| CA | AWS | CPU | 2.99 | CAN |
| | | HDD | 2.99 | CAN |
| | | RAM | 2.99 | CAN |
| US | GOOG | CPU | 3.99 | USD |
| | | HDD | 3.99 | USD |
| | | RAM | 3.99 | USD |
--------------------------------------------------------
这是我的查询
SELECT
country,
platformName,
NEST(type) AS type,
NEST(price) AS price,
CASE
WHEN NEST(currency) = '' THEN NULL
ELSE NEST(currency)
END AS currency,
FROM
tbl
WHERE
master_id = 123
GROUP BY
platform_name,
country
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT product.country, general.platformName, ARRAY_AGG(offer) AS offers
FROM data, UNNEST(products) AS product, UNNEST(offers) AS offer
WHERE id = 123
GROUP BY product.country, general.platformName
希望我得到了正确的架构
I keep getting: Values referenced in UNNEST must be arrays for offers.
完全 100% 正确。正如我提到的 - 我希望我正确地得到了你的模式。
所以上面的查询适用于如下模式(我认为它代表了你所提出的问题)
您可以使用以下虚拟数据对其进行测试:
#standardSQL
WITH data AS (
SELECT 1 AS Id,
STRUCT<platformId INT64, platformName STRING>(123, 'name 1') AS general,
ARRAY<STRUCT<country STRING, url STRING, offers ARRAY<STRUCT<type STRING, price FLOAT64, currentcy STRING>>>>
[
('US', 'google.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 1', 1.99, 'USD'), ('offer 2', 2.99, 'USD'),('offer 3', 3.99, 'USD')]),
('CA', 'yahoo.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 4', 1.99, 'USD'), ('offer 5', 2.99, 'USD')]),
('EU', 'apple.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 6', 1.99, 'USD')])
] AS products UNION ALL
SELECT 2 AS Id,
STRUCT<platformId INT64, platformName STRING>(123, 'name 2') AS general,
ARRAY<STRUCT<country STRING, url STRING, offers ARRAY<STRUCT<type STRING, price FLOAT64, currentcy STRING>>>>
[
('US', 'google.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 7', 1.99, 'USD'), ('offer 8', 2.99, 'USD'),('offer 9', 3.99, 'USD')]),
('MX', 'yahoo.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 10', 1.99, 'USD'), ('offer 11', 2.99, 'USD')]),
('CA', 'apple.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 12', 1.99, 'USD')])
] AS products
)
SELECT product.country, general.platformName, ARRAY_AGG(offer) AS offers
FROM data, UNNEST(products) AS product, UNNEST(offers) AS offer
WHERE id = 1
GROUP BY product.country, general.platformName
产生的结果如下
当然,如果您的真实架构不同 - 您应该深入研究并尝试根据您的具体情况进行调整。我希望你会这样做 :o)
我目前有一个 table 看起来像这样:
Id // Key
General
platformId
platformName
Products [
Repeated Product {
Country
URL
Offers [
Repeated Offer
Type
Price
Currency
]
}
]
我需要将其转换为不同的格式:
Record ID // Key
Country
Providers [
Repeated provider
platformName
Offers [
Repeated Offer
Type
Price
Currency
]
]
我最初将 table 展平并得到这样的结果:
id,platformId,platformName,products.product.country,products.product.offers.offer.price,products.product.offers.offer.type,products.product.offers.offer.currency
1,123,AWS,US,1.99,CPU,USD
1,123,AWS,US,1.99,HDD,USD
1,123,AWS,US,1.99,RAM,USD
2,123,AWS,CA,2.99,CPU,CAN
2,123,AWS,CA,2.99,HDD,CAN
2,123,AWS,CA,2.99,RAM,CAN
3,123,GOOG,US,3.99,CPU,GBP
3,123,GOOG,US,3.99,HDD,GBP
3,123,GOOG,US,3.99,RAM,GBP
我想按国家和平台名称对以下字段进行分组:
1,123,AWS,US,1.99,CPU,USD
1,123,AWS,US,1.99,HDD,USD
1,123,AWS,US,1.99,RAM,USD
3,123,GOOG,US,1.99,CPU,GBP
3,123,GOOG,US,1.99,HDD,GBP
3,123,GOOG,US,1.99,RAM,GBP
字段结构应如下所示:
123,US,AWS
CPU,1.99,USD
HDD,1.99,USD
RAM,1.99,USD
GOOG
CPU,3.99,USD
HDD,3.99,USD
RAM,3.99,USD
有什么指点吗? 目前我无法按国家/地区分组:
+---------+---------------+--------+--------+----------+
| country | platformName | type | price | currency |
+---------+---------------+--------+--------+----------+
| US | AWS | CPU | 1.99 | USD |
| | | HDD | 1.99 | USD |
| | | RAM | 1.99 | USD |
| CA | AWS | CPU | 2.99 | CAN |
| | | HDD | 2.99 | CAN |
| | | RAM | 2.99 | CAN |
| US | GOOG | CPU | 3.99 | USD |
| | | HDD | 3.99 | USD |
| | | RAM | 3.99 | USD |
--------------------------------------------------------
这是我的查询
SELECT
country,
platformName,
NEST(type) AS type,
NEST(price) AS price,
CASE
WHEN NEST(currency) = '' THEN NULL
ELSE NEST(currency)
END AS currency,
FROM
tbl
WHERE
master_id = 123
GROUP BY
platform_name,
country
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT product.country, general.platformName, ARRAY_AGG(offer) AS offers
FROM data, UNNEST(products) AS product, UNNEST(offers) AS offer
WHERE id = 123
GROUP BY product.country, general.platformName
希望我得到了正确的架构
I keep getting: Values referenced in UNNEST must be arrays for offers.
完全 100% 正确。正如我提到的 - 我希望我正确地得到了你的模式。
所以上面的查询适用于如下模式(我认为它代表了你所提出的问题)
您可以使用以下虚拟数据对其进行测试:
#standardSQL
WITH data AS (
SELECT 1 AS Id,
STRUCT<platformId INT64, platformName STRING>(123, 'name 1') AS general,
ARRAY<STRUCT<country STRING, url STRING, offers ARRAY<STRUCT<type STRING, price FLOAT64, currentcy STRING>>>>
[
('US', 'google.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 1', 1.99, 'USD'), ('offer 2', 2.99, 'USD'),('offer 3', 3.99, 'USD')]),
('CA', 'yahoo.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 4', 1.99, 'USD'), ('offer 5', 2.99, 'USD')]),
('EU', 'apple.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 6', 1.99, 'USD')])
] AS products UNION ALL
SELECT 2 AS Id,
STRUCT<platformId INT64, platformName STRING>(123, 'name 2') AS general,
ARRAY<STRUCT<country STRING, url STRING, offers ARRAY<STRUCT<type STRING, price FLOAT64, currentcy STRING>>>>
[
('US', 'google.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 7', 1.99, 'USD'), ('offer 8', 2.99, 'USD'),('offer 9', 3.99, 'USD')]),
('MX', 'yahoo.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 10', 1.99, 'USD'), ('offer 11', 2.99, 'USD')]),
('CA', 'apple.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 12', 1.99, 'USD')])
] AS products
)
SELECT product.country, general.platformName, ARRAY_AGG(offer) AS offers
FROM data, UNNEST(products) AS product, UNNEST(offers) AS offer
WHERE id = 1
GROUP BY product.country, general.platformName
产生的结果如下
当然,如果您的真实架构不同 - 您应该深入研究并尝试根据您的具体情况进行调整。我希望你会这样做 :o)