如何访问 tibble 列表以检查 "UTF-8" 和 运行 是否导入 R
How to access a list of tibble to check whether "UTF-8" and run import R
目标:
在 import 和 rbind 之前检查文件列表是否具有相同的编码,如果不相同停止 运行
# files list & check encoding
FL_PATH <- list.files(path,pattern = "*.csv",full.name = T)
library(readr)
lapply(FL_PATH,guess_encoding)
# if there is "UTF-8" , STOP RUN , if "Shift_JIS" , RUN the next scripts below :
# import
library(rio)
DT <- rbindlist(lapply(FL_PATH ,import,sep=",",setclass = "data.table"))
# OVER 500 rows to run if the files are same encoding to rbind
DT[,"NEW_COL":="A"]
DT[,"NEW_COL_2":="B"]
.....
# result of --lapply(FL_PATH,guess_encoding)
> lapply(FL_PATH,guess_encoding)
[[1]]
# A tibble: 3 x 2
encoding confidence
<chr> <dbl>
1 Shift_JIS 0.8
2 GB18030 0.76
3 Big5 0.46
[[2]]
# A tibble: 3 x 2
encoding confidence
<chr> <dbl>
1 GB18030 0.82
2 UTF-8 0.8
3 Big5 0.44
- 问题1:如何访问lapply readr的结果变量
检测 UTF-8 和 STOP(必须修改 R 外的编码,如果
UTF-8 存在吗?)
- 问题2:如何连接大量正常处理脚本
"if & STOP run" ?
首先,得到最可能的编码:
enc <- sapply(FL_PATH,function(x) guess_encoding(x)$encoding[1])
然后,如果有任何文件是UTF-8,则停止执行。
if(any(grepl('UTF-8',enc)))
stop('UTF-8 present') # This will stop with an error if true
# Now, read files and rbind
dlist <- lapply(FL_PATH,read_csv)
DT <- rbindlist(dlist)
目标: 在 import 和 rbind 之前检查文件列表是否具有相同的编码,如果不相同停止 运行
# files list & check encoding
FL_PATH <- list.files(path,pattern = "*.csv",full.name = T)
library(readr)
lapply(FL_PATH,guess_encoding)
# if there is "UTF-8" , STOP RUN , if "Shift_JIS" , RUN the next scripts below :
# import
library(rio)
DT <- rbindlist(lapply(FL_PATH ,import,sep=",",setclass = "data.table"))
# OVER 500 rows to run if the files are same encoding to rbind
DT[,"NEW_COL":="A"]
DT[,"NEW_COL_2":="B"]
.....
# result of --lapply(FL_PATH,guess_encoding)
> lapply(FL_PATH,guess_encoding)
[[1]]
# A tibble: 3 x 2
encoding confidence
<chr> <dbl>
1 Shift_JIS 0.8
2 GB18030 0.76
3 Big5 0.46
[[2]]
# A tibble: 3 x 2
encoding confidence
<chr> <dbl>
1 GB18030 0.82
2 UTF-8 0.8
3 Big5 0.44
- 问题1:如何访问lapply readr的结果变量 检测 UTF-8 和 STOP(必须修改 R 外的编码,如果 UTF-8 存在吗?)
- 问题2:如何连接大量正常处理脚本 "if & STOP run" ?
首先,得到最可能的编码:
enc <- sapply(FL_PATH,function(x) guess_encoding(x)$encoding[1])
然后,如果有任何文件是UTF-8,则停止执行。
if(any(grepl('UTF-8',enc)))
stop('UTF-8 present') # This will stop with an error if true
# Now, read files and rbind
dlist <- lapply(FL_PATH,read_csv)
DT <- rbindlist(dlist)