Bigrquery 强制将字符串强制转换为整数（schema 是一个字符串）

Question

我正在处理邮政编码，当然它有前导零。我正确地加载了我的数据框以保留 R 中的前导零，但上传步骤似乎失败了。这就是我的意思：

这是我的 minimal.csv 文件：

zip,val
07030,10
10001,100
90210,1000
60602,10000

这是 R 代码

require("bigrquery")
filename <- "minimal.csv"
tablename <- "as_STRING"
ds <- bq_dataset(project='myproject', dataset="zips")

我也在我的模式中正确设置了类型以期望它们是字符串。

# first pass
df <- read.csv(filename, stringsAsFactors=F)
# > df
#     zip   val
# 1  7030    10
# 2 10001   100
# 3 90210  1000
# 4 60602 10000

# uh oh!  Let's fix it!

cols <- unlist(lapply(df, class))
cols[[1]] <- "character" # make zipcode a character

# then reload
df2 <- read.csv(filename, stringsAsFactors=F, colClasses=cols)
# > df2
#     zip   val
# 1 07030    10
# 2 10001   100
# 3 90210  1000
# 4 60602 10000

# much better!  You can see my zips are now strings.

但是，当我尝试上传字符串时，bigrquery 界面抱怨说我上传的是整数，而事实并非如此。这是架构，需要字符串：

# create schema
bq_table_create(bq_table(ds, tablename), fields=df2) # using df2, which has strings

# now prove it got the strings right:
    > bq_table_meta(bq_table(ds, tablename))$schema$fields
    [[1]]
    [[1]]$name
    [1] "zip"

    [[1]]$type
    [1] "STRING"                # GOOD, ZIP IS A STRING!

    [[1]]$mode
    [1] "NULLABLE"


    [[2]]
    [[2]]$name
    [1] "val"

    [[2]]$type
    [1] "INTEGER"

    [[2]]$mode
    [1] "NULLABLE"

现在是时候上传....

bq_table_upload(bq_table(ds, tablename), df2) # using df2, with STRINGS
Error: Invalid schema update. Field zip has changed type from STRING to INTEGER [invalid]

嗯？这个无效的模式更新是什么，我如何阻止它尝试将我的字符串（数据包含且模式是）更改为整数，我的数据不包含且模式不包含？

是否有 javascript 序列化正在发生并将我的字符串变回整数？

Answer 1

这是因为 BigQuery 会在未指定架构时自动检测架构。这可以通过指定 fields 参数来解决，如下所示（有关详细信息，请参阅）：

bq_table_upload(bq_table(ds, tablename), df2,fields = list(bq_field("zip", "string"),bq_field("val", "integer")))

更新：

查看代码，bq_table_upload is calling bq_perform_upload，它以参数 fields 作为模式。最后，它将 data frame 解析为 JSON 文件以将其上传到 BigQuery。

Answer 2

简单地改变：

bq_table_upload(tab, df)

至

bq_table_upload(tab, df, fields=df)

有效。

Bigrquery 强制将字符串强制转换为整数（schema 是一个字符串）

Bigrquery forcefully coerces strings to integers (schema is a string)

r

google-bigquery

bigrquery