使用数据格式增量更新 table 时出错

Error incrementally updating a table with datform

我在 BigQuery 中有一个 table,我想在新数据进入另一个 table 时每天更新它。现在,我还需要对原始数据框中嵌套列中存在的变量进行一些转换。因此,我做了以下事情:


config {
  type: "incremental",
  bigquery: {
    partitionBy: "DATE(ts)",
    clusterBy: ["IN","SN", "CB","B"]
  }
}


SELECT distinct 

timestamp as ts,
storeName,
DATE(timestamp) as Date,
itemId,
runId,
source,
storeUuid,

CASE WHEN (P.value IS NOT NULL AND SAFE_CAST(REPLACE(P.value, ",", ".") AS FLOAT64) IS NOT NULL)
THEN CAST(REPLACE(P.value, ",", ".") AS FLOAT64)/100
ELSE NULL END as P,

CASE WHEN (BUP.value IS NOT NULL AND SAFE_CAST(REPLACE(BUP.value, ",", ".") AS FLOAT64) IS NOT NULL)
THEN CAST(REPLACE(BUP.value, ",", ".") AS FLOAT64)/100
ELSE NULL END as BUP,

CASE WHEN (NP.value IS NOT NULL AND SAFE_CAST(REPLACE(NP.value, ",", ".") AS FLOAT64) IS NOT NULL)
THEN CAST(REGEXP_REPLACE(NP.value, ",", ".") AS FLOAT64)/100
ELSE NULL END as NP,

CASE WHEN (Quantity.value IS NOT NULL AND SAFE_CAST(REPLACE(Quantity.value, ",", ".") AS INT64) IS NOT NULL)
THEN CAST(REPLACE(Quantity.value, ",", ".") AS INT64)
ELSE NULL END as Quantity,

CASE WHEN (CQ.value IS NOT NULL AND SAFE_CAST(REPLACE(CQ.value, ",", ".") AS INT64) IS NOT NULL)
THEN CAST(REGEXP_REPLACE(CQ.value, ",", ".") AS INT64)
ELSE NULL END as CQ, 


CAST(AverageSalesOverAWeek.value AS FLOAT64) AS AverageSalesOverAWeek,
CAST(CentralMainBarcode.value AS STRING) AS CentralMainBarcode,
CAST(MainBarcode.value AS STRING) AS MainBarcode,

CASE WHEN CAST(MB.value AS STRING) IS NULL
THEN CAST(CMB.value AS STRING)
ELSE CAST(MB.value AS STRING) END as B,

itemName,

itemGroup.value as itemGroup,

CAST(PT.value as String) as  PT,

department.value as Department,


UC.value as UC,

KP.value as IKP,

NOCD.value as NOCD,

NPCEPEON.value as NPCEPEON ,
CCP.value as CCP,
LDWA.value as LDWA,
IOOS.value as IOOS,

CASE
WHEN ID.value = 'true' THEN 1
WHEN ID.value = 'false' THEN 0
ELSE NULL
END as ID,

CASE
WHEN DT.value = 'true' THEN 1
WHEN DT.value = 'false' THEN 0
ELSE NULL
END as DT,

CASE
WHEN CID.value = 'true' THEN 1
WHEN CID.value = 'false' THEN 0
ELSE NULL
END as CID,

CASE
WHEN HP.value = 'true' THEN 1
WHEN HP.value = 'false' THEN 0
ELSE NULL
END as HP,

CAST(presentation.value AS STRING) AS presentation,

CASE WHEN presentation.value = 'promo' THEN 1 ELSE 0 END as PF,

CAST(InStock.value AS INT64) AS InStock,
CAST(Brand.value AS STRING) AS Brand,
CAST(CB.value AS STRING) AS CentralBrand,
CPD.value as CPD,
CPPD.value as CPPD,
..........

5 more like these 

FROM

`table`

LEFT OUTER JOIN UNNEST(itemProperties) AS ReceiptProductDescription ON ReceiptProductDescription.name = 'ReceiptProductDescription'
LEFT OUTER JOIN UNNEST(itemProperties) AS GD ON GD.name = 'GD'
LEFT OUTER JOIN UNNEST(itemProperties) AS MB ON MD.name = 'MB'
LEFT OUTER JOIN UNNEST(itemProperties) AS B ON B.name = 'B'
LEFT OUTER JOIN UNNEST(itemProperties) AS CB ON CB.name = 'CB'
LEFT OUTER JOIN UNNEST(itemProperties) AS IG ON IG.name = 'IG'
LEFT OUTER JOIN UNNEST(itemProperties) AS PT ON PT.name = 'PT'
LEFT OUTER JOIN UNNEST(itemProperties) AS CMB ON CMB .name = 'CMB '
LEFT OUTER JOIN UNNEST(itemProperties) AS CPD ON CPD.name = 'CPD '
LEFT OUTER JOIN UNNEST(itemProperties) AS CPPD ON CPPD .name = 'CPPD '
LEFT OUTER JOIN UNNEST(itemProperties) AS IS ON IS.name = 'IS'
LEFT OUTER JOIN UNNEST(itemProperties) AS Quantity ON Quantity.name = 'Quantity'
LEFT OUTER JOIN UNNEST(itemProperties) AS CCQ ON CCQ.name = 'CCQ '
LEFT OUTER JOIN UNNEST(itemProperties) AS ISD ON ISD.name = 'ISD '
LEFT OUTER JOIN UNNEST(itemProperties) AS DeliveredToday ON DeliveredToday.name = 'DeliveredToday'
LEFT OUTER JOIN UNNEST(itemProperties) AS IOOS ON IOOS.name = 'IOOS '
LEFT OUTER JOIN UNNEST(itemProperties) AS LDWA ON LDWA.name = 'LDWA '
LEFT OUTER JOIN UNNEST(itemProperties) AS NPCEPEON NPCEPEON.name = 'NPCEPEON'
LEFT OUTER JOIN UNNEST(itemProperties) AS NP ON NP.name = 'NP'
LEFT OUTER JOIN UNNEST(itemProperties) AS BUP ON BUP.name = 'BUP'
LEFT OUTER JOIN UNNEST(itemProperties) AS HP ON HP.name = 'HP'
LEFT OUTER JOIN UNNEST(itemProperties) AS CCP ON CCP.name = 'CCP '
LEFT OUTER JOIN UNNEST(itemProperties) AS NPD ON NPD.name = 'NPD '
LEFT OUTER JOIN UNNEST(itemProperties) AS UnitCode ON UnitCode.name = 'UnitCode'
LEFT OUTER JOIN UNNEST(itemProperties) AS P ON P.name = 'P'
.........

5 more like that

${when(incremental(), `where timestamp > (select max(timestamp) from ${self()})`)}

我遇到的问题与查询无关,因为它运行在 BQ 中很好。但问题是,我无法 运行 来自 Dataform 的增量更新。

我收到以下错误消息:

Error: Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 162% of limit. Top memory consumer(s): JOIN operations: 99% other/unattributed: 1%

我已经看到如果 groupby 不用于连接,则可以应用解决方案。

我该如何解决这个问题?

我应该指出(大多数读过这篇文章的人都会明白)我很少与 SQL 一起工作。如果有比左外连接更有效地取消嵌套数据的另一种方法可用,我洗耳恭听。

有些操作必须在一台机器上执行[1]。当您的数据溢出插槽的内存时,您会遇到出现的错误。我对你需要的方法是将插槽类型更改为更高内存的机器[2],或者切换到固定费率定价以获得预留资源[3]。

否则,您将必须遵循查询最佳实践以避免对过多数据进行单槽操作。通常有帮助的一件事是移动到标准 SQL 如果你还没有这样做的话。此外,我附上了一些有关 BigQuery 中的内存在执行期间如何工作的信息[4]。

[1] https://cloud.google.com/bigquery/docs/best-practices-performance-output#use_a_limit_clause_with_large_sorts

[2]https://cloud.google.com/bigquery/docs/slots

[3] https://cloud.google.com/bigquery/docs/reservations-intro

[4]https://cloud.google.com/blog/products/bigquery/in-memory-query-execution-in-google-bigquery