在 R 中组合多个 MySQL 表的最佳方法
Best approach to combine multiple MySQL tables in R
在 R 中组合多个 MySQL table 的最佳方法是什么?例如,我需要 rbind
14 个大的 `MySQL tables(每个 >100k 行乘以 100 列)。我尝试了下面的方法,它消耗了我的大部分内存并从 MySQL 中超时。我想知道是否有替代解决方案?我不需要获取整个 table,只需要将整个 table 按几个变量分组并计算一些指标。
station_tbl_t <- dbSendQuery(my_db, "select * from tbl_r3_300ft
union all
select * from tbl_r4_350ft
union all
select * from tbl_r5_400ft
union all
select * from tbl_r6_500ft
union all
select * from tbl_r7_600ft
union all
select * from tbl_r8_700ft
union all
select * from tbl_r9_800ft
union all
select * from tbl_r10_900ft
union all
select * from tbl_r11_1000ft
union all
select * from tbl_r12_1200ft
union all
select * from tbl_r13_1400ft
union all
select * from tbl_r14_1600ft
union all
select * from tbl_r15_1800ft
union all
select * from tbl_r16_2000ft
")
考虑迭代导入 MySQL table 数据,然后使用 R 进行行绑定。并确保 select 需要的列以节省开销:
tbls <- c("tbl_r3_300ft", "tbl_r4_350ft", "tbl_r5_400ft",
"tbl_r6_500ft", "tbl_r7_600ft", "tbl_r8_700ft",
"tbl_r9_800ft", "tbl_r10_900ft", "tbl_r11_1000ft",
"tbl_r12_1200ft", "tbl_r13_1400ft", "tbl_r14_1600ft",
"tbl_r15_1800ft", "tbl_r16_2000ft")
sql <- "SELECT Col1, Col2, Col3 FROM"
dfList <- lapply(paste(sql, tbls), function(s) {
tryCatch({ return(dbGetQuery(my_db, s))
}, error = function(e) return(as.character(e)))
})
# ROW BIND VERSIONS ACROSS PACKAGES
master_df <- base::do.call(rbind, dfList)
master_df <- plyr::rbind.fill(dfList)
master_df <- dplyr::bind_rows(dfList)
master_df <- data.table::rbindlist(dfList)
在 R 中组合多个 MySQL table 的最佳方法是什么?例如,我需要 rbind
14 个大的 `MySQL tables(每个 >100k 行乘以 100 列)。我尝试了下面的方法,它消耗了我的大部分内存并从 MySQL 中超时。我想知道是否有替代解决方案?我不需要获取整个 table,只需要将整个 table 按几个变量分组并计算一些指标。
station_tbl_t <- dbSendQuery(my_db, "select * from tbl_r3_300ft
union all
select * from tbl_r4_350ft
union all
select * from tbl_r5_400ft
union all
select * from tbl_r6_500ft
union all
select * from tbl_r7_600ft
union all
select * from tbl_r8_700ft
union all
select * from tbl_r9_800ft
union all
select * from tbl_r10_900ft
union all
select * from tbl_r11_1000ft
union all
select * from tbl_r12_1200ft
union all
select * from tbl_r13_1400ft
union all
select * from tbl_r14_1600ft
union all
select * from tbl_r15_1800ft
union all
select * from tbl_r16_2000ft
")
考虑迭代导入 MySQL table 数据,然后使用 R 进行行绑定。并确保 select 需要的列以节省开销:
tbls <- c("tbl_r3_300ft", "tbl_r4_350ft", "tbl_r5_400ft",
"tbl_r6_500ft", "tbl_r7_600ft", "tbl_r8_700ft",
"tbl_r9_800ft", "tbl_r10_900ft", "tbl_r11_1000ft",
"tbl_r12_1200ft", "tbl_r13_1400ft", "tbl_r14_1600ft",
"tbl_r15_1800ft", "tbl_r16_2000ft")
sql <- "SELECT Col1, Col2, Col3 FROM"
dfList <- lapply(paste(sql, tbls), function(s) {
tryCatch({ return(dbGetQuery(my_db, s))
}, error = function(e) return(as.character(e)))
})
# ROW BIND VERSIONS ACROSS PACKAGES
master_df <- base::do.call(rbind, dfList)
master_df <- plyr::rbind.fill(dfList)
master_df <- dplyr::bind_rows(dfList)
master_df <- data.table::rbindlist(dfList)