BigQuery / Shopify 订单数据查询
BigQuery / Shopify Order Data Query
我从 Shopify 导入的订单在 BigQuery 中为每个订单创建了一个新条目,如果自上次导入以来发生了某些变化,这样您就可以看到订单属性如何随时间变化而不仅仅是上次导入状态。这还会在 table 中为同一订单创建多个条目,其中唯一的唯一部分是 _sdc_batched_at
和 sdc_sequence
值。我有时会看到多达 30 个相同订单的条目。
Table 架构...
order:
order_number: Int
fulfillments: Array
_sdc_batched_at: DateTime
_sdc_sequence: Int
我做了什么...
我创建了一个分区的 table,它基本上归结为给定日期范围内的条目子集,其中 fulfillments > 0
减少数据集的初始查询...
with orders as (
select order_number, fulfillments, _sdc_batched_at, _sdc_sequence
from `project.shopify.orders`
where created_at between '2018-11-08' and '2018-11-15'
and ARRAY_LENGTH(fulfillments) > 0
)
问题...
我 运行 遇到了尝试使用 distinct 或 group by 的问题,因为 fulfillments 是一个数组并且会抛出问题。我如何编写一个查询,该查询仅 return 最新的订单条目 _sdc_batched_at
值?
示例数据
[
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 02:46:21.270 UTC",
"_sdc_sequence": "1541817507934"
},
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 03:16:16.606 UTC",
"_sdc_sequence": "1541819139795"
},
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
"_sdc_sequence": "1541821046476"
},
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
"_sdc_sequence": "1541822755508"
},
{
"order_number": "2212",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
"_sdc_sequence": "1541821046476"
},
{
"order_number": "2212",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
"_sdc_sequence": "1541822755508"
}
]
预期结果
Return 只有 _sdc_batched_at
值
的最新条目
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
"_sdc_sequence": "1541822755508"
},
{
"order_number": "2212",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
"_sdc_sequence": "1541822755508"
}
以下适用于 BigQuery 标准 SQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY _sdc_batched_at DESC LIMIT 1)[OFFSET(0)]
FROM `project.shopify.orders` t
GROUP BY order_number
显然您可以添加 WHERE 子句所需的所有内容
我从 Shopify 导入的订单在 BigQuery 中为每个订单创建了一个新条目,如果自上次导入以来发生了某些变化,这样您就可以看到订单属性如何随时间变化而不仅仅是上次导入状态。这还会在 table 中为同一订单创建多个条目,其中唯一的唯一部分是 _sdc_batched_at
和 sdc_sequence
值。我有时会看到多达 30 个相同订单的条目。
Table 架构...
order:
order_number: Int
fulfillments: Array
_sdc_batched_at: DateTime
_sdc_sequence: Int
我做了什么...
我创建了一个分区的 table,它基本上归结为给定日期范围内的条目子集,其中 fulfillments > 0
减少数据集的初始查询...
with orders as (
select order_number, fulfillments, _sdc_batched_at, _sdc_sequence
from `project.shopify.orders`
where created_at between '2018-11-08' and '2018-11-15'
and ARRAY_LENGTH(fulfillments) > 0
)
问题...
我 运行 遇到了尝试使用 distinct 或 group by 的问题,因为 fulfillments 是一个数组并且会抛出问题。我如何编写一个查询,该查询仅 return 最新的订单条目 _sdc_batched_at
值?
示例数据
[
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 02:46:21.270 UTC",
"_sdc_sequence": "1541817507934"
},
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 03:16:16.606 UTC",
"_sdc_sequence": "1541819139795"
},
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
"_sdc_sequence": "1541821046476"
},
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
"_sdc_sequence": "1541822755508"
},
{
"order_number": "2212",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
"_sdc_sequence": "1541821046476"
},
{
"order_number": "2212",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
"_sdc_sequence": "1541822755508"
}
]
预期结果
Return 只有 _sdc_batched_at
值
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
"_sdc_sequence": "1541822755508"
},
{
"order_number": "2212",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
"_sdc_sequence": "1541822755508"
}
以下适用于 BigQuery 标准 SQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY _sdc_batched_at DESC LIMIT 1)[OFFSET(0)]
FROM `project.shopify.orders` t
GROUP BY order_number
显然您可以添加 WHERE 子句所需的所有内容