如何在 R 中仅用 space (" ") 替换单元格
How to replace cells with only a space (" ") in R
我试图用 R 中的 space (" ") 替换单元格,但由于某种原因它不起作用。我的矢量是这样的:
[1] "SICREDI N/NE" "SICOOB CREDIMINAS" "UNICRED SC/PR"
[4] " " " " "CRESOL SC/RS"
我尝试使用 CENTRAL<-gsub("\\b \\b", NA,CENTRAL)
但随后返回:
[1] NA NA NA NA NA
[6] "CRESOL SC/RS" NA NA NA NA
您的话中有空格,因此 gsub 插入了一个 NA,导致整个条目中的值为 NA。你可以这样做:
vec <- c("words with spaces", "word with spaces", " ", " ", "not", "here")
vec
[1] "words with spaces"
[2] "word with spaces"
[3] " "
[4] " "
[5] "not"
[6] "here"
vec[vec==" "]
[1] " " " "
vec[vec==" "] <- NA
vec
[1] "words with spaces"
[2] "word with spaces"
[3] NA
[4] NA
[5] "not"
[6] "here"
更快的方法可能是(Gabriel 先于我):
x <- c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
" ", " ", "CRESOL SC/RS")
x[x == " "] <- NA
您使用正则表达式所做的工作有效,但速度要慢得多(以毫秒为单位测量超过 40,000 个元素)
x <- rep(c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
" ", " ", "CRESOL SC/RS"), 10000)
y <- rep(c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
" ", " ", "CRESOL SC/RS"), 10000)
z <- rep(c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
" ", " ", "CRESOL SC/RS"), 10000)
library(microbenchmark)
microbenchmark(
first = {x[x == " "] <- NA},
second = {y[grepl("^\b \b$", y)] <- NA},
sub = gsub("^\b \b$", NA, z)
)
Unit: milliseconds
expr min lq mean median uq max neval cld
first 1.223415 1.231626 1.367973 1.235438 1.247461 2.896081 100 a
second 5.633810 5.681902 5.929447 5.697737 5.742457 8.063632 100 b
sub 16.960371 17.223557 17.345403 17.271795 17.308452 18.919242 100 c
就个人观点而言,我发现 x[x == " "] <- NA
比任何一种正则表达式方法都更容易阅读。
如果你想稍微提高速度,可以使用x[x %in% " "] <- NA
,比==
效率更高,但也只是勉强而已。
(现在我正式花了太多时间探索这个:))
我试图用 R 中的 space (" ") 替换单元格,但由于某种原因它不起作用。我的矢量是这样的:
[1] "SICREDI N/NE" "SICOOB CREDIMINAS" "UNICRED SC/PR"
[4] " " " " "CRESOL SC/RS"
我尝试使用 CENTRAL<-gsub("\\b \\b", NA,CENTRAL)
但随后返回:
[1] NA NA NA NA NA
[6] "CRESOL SC/RS" NA NA NA NA
您的话中有空格,因此 gsub 插入了一个 NA,导致整个条目中的值为 NA。你可以这样做:
vec <- c("words with spaces", "word with spaces", " ", " ", "not", "here")
vec
[1] "words with spaces"
[2] "word with spaces"
[3] " "
[4] " "
[5] "not"
[6] "here"
vec[vec==" "]
[1] " " " "
vec[vec==" "] <- NA
vec
[1] "words with spaces"
[2] "word with spaces"
[3] NA
[4] NA
[5] "not"
[6] "here"
更快的方法可能是(Gabriel 先于我):
x <- c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
" ", " ", "CRESOL SC/RS")
x[x == " "] <- NA
您使用正则表达式所做的工作有效,但速度要慢得多(以毫秒为单位测量超过 40,000 个元素)
x <- rep(c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
" ", " ", "CRESOL SC/RS"), 10000)
y <- rep(c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
" ", " ", "CRESOL SC/RS"), 10000)
z <- rep(c("SICREDI N/NE", "SICOOB CREDIMINAS", "UNICRED SC/PR",
" ", " ", "CRESOL SC/RS"), 10000)
library(microbenchmark)
microbenchmark(
first = {x[x == " "] <- NA},
second = {y[grepl("^\b \b$", y)] <- NA},
sub = gsub("^\b \b$", NA, z)
)
Unit: milliseconds
expr min lq mean median uq max neval cld
first 1.223415 1.231626 1.367973 1.235438 1.247461 2.896081 100 a
second 5.633810 5.681902 5.929447 5.697737 5.742457 8.063632 100 b
sub 16.960371 17.223557 17.345403 17.271795 17.308452 18.919242 100 c
就个人观点而言,我发现 x[x == " "] <- NA
比任何一种正则表达式方法都更容易阅读。
如果你想稍微提高速度,可以使用x[x %in% " "] <- NA
,比==
效率更高,但也只是勉强而已。
(现在我正式花了太多时间探索这个:))