Google 如果我在流式传输之前先删除 table 并创建 table，BigQuery Streaming 有时会失败

Google BigQuery Streaming failed sometimes if I do delete table and create table first, before streaming

我正在将数据流式传输到 BigQuery table。

删除旧的table
创建一个具有相同名称和相同架构的新 table
将数据流式传输到新的 table

我以前做过很多次，效果很好。但最近我开始发现上述方法不起作用。

流式传输完成后（未报告错误），我查询 table，有时它有效。有时，我变得空虚table。（相同的脚本，相同的数据，运行多次，结果不同。有时有效，有时无效。）

更神秘的是，当我流式传输大量数据时，它似乎大部分时间都在工作。但是当我流式传输少量数据时，大多数时候它都失败了。

但如果我这样做

新建一个table
将数据流式传输到新 table

它总是有效。

我在 Google Apps Scrip 和 PHP Google Cloud Client Library for BigQuery 中都试过了。我遇到了同样的问题。

所以我在 Google Apps Script

中尝试了这个

删除旧的table
休眠 10 秒，删除作业应该完成了
创建一个具有相同名称和相同架构的新 table
休眠 10 秒，创建作业应该完成了
将数据流式传输到新的 table

它仍然给我同样的问题。

但是没有报告或记录错误。

附加信息：

我又试了一次。

如果我等到流缓冲区为空，然后运行脚本。结果总是正确的。新数据流入新 table 成功。

但是如果我运行脚本紧跟在前一个运行ning 之后，那么结果是空的。数据未流入新 table.

所以当我 "delete the old table and create the new table" 当流缓冲区不为空时似乎发生了错误。

但是根据这个帖子的回答，，

旧的table和新的table（即使它们具有相同的名称和相同的架构），它们具有两个不同的"object id"。它们实际上是两个不同的table。在我删除旧的 table 之后，流缓冲区中的旧记录也会被删除。流缓冲区是否为空，它不应该影响我接下来的步骤，创建一个新的 table 并将新数据流式传输到新的 table.

另一方面，如果我尝试 "truncate old table" 而不是 "delete old table and create a new table"，而流缓冲区中可能仍有数据，那么 "DML statement cannot modify data still in stream buffer"，所以 "truncate old table" 会失败。

简单来说，在这个用例中，

我不能运行cate 旧的table，因为蒸汽缓冲区可能不是空的。
我应该"delete old table and create new table, then stream data to new table"。但这似乎是我当前问题的根源，我的新数据无法流式传输到新的table（即使新的table具有新的对象ID，它不应该受到事实的影响我只是删除了一个旧的 table)

避免在流式传输时截断和重新创建 tables。

来自官方文档：

https://cloud.google.com/bigquery/troubleshooting-errors#streaming

Table Creation/Deletion - Streaming to a nonexistent table will return a variation of a notFound response. Creating the table in response may not immediately be recognized by subsequent streaming inserts. Similarly, deleting and/or recreating a table may create a period of time where streaming inserts are effectively delivered to the old table and will not be present in the newly created table.

Table Truncation - Truncating a table's data (e.g. via a query job that uses writeDisposition of WRITE_TRUNCATE) may similarly cause subsequent inserts during the consistency period to be dropped.

为避免丢失数据：使用不同的名称创建一个新的 table。

我在我的另一个线程中发布了有关流式传输到 BigQuery 的信息。现在作为一项规则，如果可以的话，我会尽量避免流式传输。

将数据加载到云存储
然后将数据从 Cloud Storage 加载到 BigQuery

这将解决许多与流相关的问题。

Google 如果我在流式传输之前先删除 table 并创建 table，BigQuery Streaming 有时会失败

Google BigQuery Streaming failed sometimes if I do delete table and create table first, before streaming

streaming

google-bigquery