有没有办法使用 dbWriteTable 仅将 field.type 数据更新到 SQL？无需传递整个 table 的值

Question

我几乎是从我的数据库中提取整个 table 使用：

ss_data<-dbConnect(MySQL(), user = ,password=, host= ,dbname= )
Combined_Query<-gsub("[\r\n]"," ",paste0("select * FROM ",table_name))
Q_temp<-dbSendQuery(ss_data,Combined_Query)
temp<-fetch(Q_temp, n=-1)
report_pull<-rbind(report_pull,temp) 
dbDisconnect(ss_data)

所以这会根据需要提取我的数据。之后执行一些测试并检查每列的大小。我根据我的测试重新分配字段类型。之后，我将使用以下方式将数据发送回数据库：

fieldtype<- setNames(newcoltypes, colnames(report_pull))
ss_data<-dbConnect(MySQL(), user =  ,password= , host= ,dbname= )
dbWriteTable(ss_data, value= report_pull, name="datatest1" ,overwrite=TRUE,row.names=FALSE, field.types = fieldtype)
dbDisconnect(ss_data)

因为我没有处理任何数据，而是将其发回，所以我想知道是否有办法只发送 field.type 数据。

我将用大约 500GB 的 tables 来做这个，如果我不必每次都推回行值，这将大大减少花费的时间。

编辑：阅读评论后，我开始循环更新。

for (x in 1:ncol(report_pull)){
  ss_data<-dbConnect(MySQL(), user = ,password=, host=,dbname=)
  a <- unique(report_pull[x])
  a[a==""] <- NA
  a <- na.omit(a)
  if (length(a[,]) != 0){
    if ((grepl("^[0-9.]+$", a[,] )) == TRUE){
      if ((grepl("^[0-9]+$", a[,] )) == TRUE){
        if (typeof(report_pull[,x]) == "double"){
          dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` double;"))
          listofcol[x] <- "double"}
        else { # For int, we need to make it tiny text, int isnt working on our DB
          dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` tinytext;"))
          listofcol[x] <- "int - tinytext"}
      }
      else {
        dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` double;"))
        listofcol[x] <- "double"
      }
    }
    else {
      if (between(max(nchar(a[,])), 50, 255)) {
        dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` varchar(255);"))
        listofcol[x] <- paste0("varchar(255)")}
      else if (max(nchar(a[,])) > 255) { 
        dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` tinytext;"))
        listofcol[x] <- paste0("tinytext") }
      else {
        dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` varchar(", max(nchar(a[,])),");"))
        listofcol[x] <- paste0("varchar(", max(nchar(a[,])),")")}
    }
    
    if (grepl("url|URL", (colnames(report_pull[x] ))) == TRUE){
      dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` tinytext;"))
      listofcol[x] <- "tinytext"
    }
  }
  else { 
    print(paste0("Element ",x[1], " has 0 entries."))
    dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` varchar(255);"))
    listofcol[x] <- "varchar(255)"
  }
  dbDisconnect(ss_data)
}

我面临的问题是在这行代码中：

dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` tinytext;"))

我在 DB 和 R 上都遇到的错误是 Incorrect integer value: '' for column 'id' at row 1

我认为问题是在创建这个 table 时，所有字段类型都设置为 TEXT。我只是很难将其设置为 int。

Answer 1

我不确定这是否会解决您的所有问题，但这是一种用于重新定义列类型的数据库内方法（在一些来自 R 的帮助下）。我假设无论实际数据如何，数据始终以 TEXT 的形式上传，因此此过程将尝试解决此问题。

Docker 设置

您的设置不需要，但如果其他人需要对此进行测试，则很有用。

$ docker pull mysql:8.0.26
$ docker run -p 3306:3306 --name some-mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:8.0.26

在 R 中确认连接：

my <- DBI::dbConnect(odbc::odbc(), Driver="MySQL ODBC 8.0 ANSI Driver", pwd="my-secret-pw", uid="root")
DBI::dbGetQuery(my, "SHOW VARIABLES LIKE 'version';")
#   Variable_name  Value
# 1       version 8.0.26
DBI::dbExecute(my, "create schema Whosebug")
DBI::dbExecute(my, "use Whosebug")

当 demos/tests 完成时（在运行这个答案的其余代码之后），

DBI::dbExecute(my, "drop table sometable")
DBI::dbExecute(my, "drop schema Whosebug")

虚假数据

DBI::dbWriteTable(my, "sometable", data.frame(
  longstring = c("a very long string here that is going to trigger perhaps a varchar(255)",
                 "234", "3.1415"),
  somefloat = c("234", "3.1415", "2.718271828"),
  someint = c("234", "3", "2"),
  longerstring = c(strrep("A", 260), "A", "A")
))

检索 table 统计数据

这是大多数 'pain' 的感受：我没有在 gargantuan tables 上测试过这个，所以请提供一些反馈。由于它使用了很多 REGEXP，我不认为它会在毫秒内 return，但也许它比下载和加载到 R 中更快。

tests <- do.call(rbind, lapply(DBI::dbListFields(my, "sometable"), function(nm) {
  DBI::dbGetQuery(my, sprintf("
    select 
      %s as column_name,
      count(*) as n,
      sum(%s regexp '^[-+]?.*[^0-9.].*$') as nonnumber,
      sum(%s regexp '^[-+]?.*[^0-9].*$') as nonint,
      max(length(%s)) as maxlen
    from sometable where %s is not null and %s <> ''", sQuote(nm), nm, nm, nm, nm, nm))
}))
tests
#    column_name n nonnumber nonint maxlen
# 1   longstring 3         1      2     71
# 2    somefloat 3         0      2     11
# 3      someint 3         0      0      3
# 4 longerstring 3         3      3    260

从这里开始，我们需要使用这些“规则”来确定新的列类型应该是什么。我正在使用 dplyr::case_when，因为我发现它比嵌套 ifelse 等要好得多。如果您一般使用 dplyr，那么这将适合您当前的方言和工具集；如果您使用的是 data.table，那么将其转换为 data.table::fcase.

应该相当简单

tests$new_type <- with(tests, dplyr::case_when(
  n == 0                   ~ "varchar(255)",
  grepl("url", column_name, ignore.case = TRUE) ~ "tinytext",
  nonint == 0              ~ "tinytext", # For int, we need to make it tiny text, int isnt working on our DB
  nonnumber == 0           ~ "double",
  maxlen < 50              ~ "tinytext",
  between(maxlen, 50, 255) ~ "varchar(255)",
  TRUE                     ~ sprintf("varchar(%s)", maxlen)))
tests
#    column_name n nonnumber nonint maxlen     new_type
# 1   longstring 3         1      2     71 varchar(255)
# 2    somefloat 3         0      2     11       double
# 3      someint 3         0      0      3     tinytext
# 4 longerstring 3         3      3    260 varchar(260)

您需要仔细阅读这些规则，以确保它们符合您的所有意图。但是，我认为像这样的简单规则的前提（以及我存储到 tests 中的四个“元”列）是一个好的开始。

更新列

从 https://dba.stackexchange.com/a/198635/156305 复制方法，我将遍历行，创建一个新列作为新类型，从旧列复制/CAST，删除旧列，然后将新名称重命名为原始名称。

我将显示转换前后的列数据类型。

DBI::dbGetQuery(my, "select column_name, data_type, character_maximum_length from information_schema.columns where table_name='sometable'")
#    COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH
# 1 longerstring      text                    65535
# 2   longstring      text                    65535
# 3    somefloat      text                    65535
# 4      someint      text                    65535

for (rownum in seq_len(nrow(tests))) {
  nm <- tests$column_name[rownum]
  typ <- tests$new_type[rownum]
  typ2 <- 
    if (grepl("varchar", typ)) {
      gsub("var", "", typ)
    } else if (typ == "tinytext") {
      "char(50)"
    } else typ
  message(sprintf("Converting column '%s' to '%s' ('%s')", nm, typ, typ2))
  DBI::dbExecute(my, sprintf("ALTER TABLE sometable ADD COLUMN newcolumn %s", typ))
  DBI::dbExecute(my, sprintf("UPDATE sometable SET newcolumn = cast(%s as %s)", nm, typ2))
  DBI::dbExecute(my, sprintf("ALTER TABLE sometable DROP COLUMN %s", nm))
  DBI::dbExecute(my, sprintf("ALTER TABLE sometable RENAME COLUMN newcolumn to %s", nm))
}
# Converting column 'longstring' to 'varchar(255)' ('char(255)')
# Converting column 'somefloat' to 'double' ('double')
# Converting column 'someint' to 'tinytext' ('char(50)')
# Converting column 'longerstring' to 'varchar(260)' ('char(260)')

DBI::dbGetQuery(my, "select column_name, data_type, character_maximum_length from information_schema.columns where table_name='sometable'")
#    COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH
# 1 longerstring   varchar                      260
# 2   longstring   varchar                      255
# 3    somefloat    double                     <NA>
# 4      someint  tinytext                      255

并证明数据仍然存在且正确，

str(DBI::dbGetQuery(my, "select * from sometable"))
# 'data.frame': 3 obs. of  4 variables:
#  $ longstring  : chr  "a very long string here that is going to trigger perhaps a varchar(255)" "234" "3.1415"
#  $ somefloat   : num  234 3.14 2.72
#  $ someint     : chr  "234" "3" "2"
#  $ longerstring: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "A" "A"

注意事项

我对REGEXP的性能没有太多经验，我可能在这里滥用了一个“重型工具”。
将数据从一列复制到另一列可能并不便宜，我没有用“大数据”进行测试。
我使用 char(.) 和 varchar(.)，您可能更喜欢使用 nchar(.) 和 nvarchar(.)，具体取决于您的数据。
我创建typ2的原因是因为MySQL不接受cast(.. as varchar(n))（）；它似乎也不喜欢 cast(.. as tinytext)。因此在 casting 期间进行临时翻译。
可能有更多 MySQL 规范的方法来执行其中的许多步骤。我一点也不自称 MySQL 大师地位。

有没有办法使用 dbWriteTable 仅将 field.type 数据更新到 SQL？无需传递整个 table 的值

Is there a way to update ONLY field.type data onto SQL using dbWriteTable? Without having to pass the whole table's value

sql

database

r

dbi

Docker 设置

虚假数据

检索 table 统计数据

更新列

注意事项