检查字符向量中的元素是否可以在 R 中转换为数字

Question

如何检查字符向量的元素是否可以转换为数字？更准确地说，当元素是浮点数或整数时，它可以毫无问题地转换为数字，但当它是字符串时，会出现警告：“NAs introduced by coercion”。我能够通过 NA 值的索引间接检查。但是，如果能够在没有收到警告的情况下执行此操作，会干净得多。

cat1 <- c("1.12354","1.4548","1.9856","some_string")
cat2 <- c("1.45678","1.1478","1.9565","1.32315")
target <- c(0,1,1,0)
df <- data.frame(cat1, cat2, target)
catCols <- c("cat1", "cat2")

for(col in catCols){
a <- as.numeric(unique(df[[col]]))
if(length(which(is.na(a))) != 0){
print(col)
print(which(is.na(a)))
 }
}

Answer 1

或许，您可以使用正则表达式来查找列中的所有值是整数还是浮点数。

can_convert_to_numeric <- function(x) {
  all(grepl('^(?=.)([+-]?([0-9]*)(\.([0-9]+))?)$', x, perl = TRUE))  
}

sapply(df[catCols], can_convert_to_numeric)
# cat1  cat2 
#FALSE  TRUE

或者，要获取无法转换为数字的值，我们可以使用 grep 作为 :

values_which_cannot_be_numeric <- function(x) {
  grep('^(?=.)([+-]?([0-9]*)(\.([0-9]+))?)$', x, perl = TRUE, invert = TRUE, value = TRUE)
}

lapply(df[catCols], values_which_cannot_be_numeric)

#$cat1
#[1] "some_string"

#$cat2
#character(0)

取自 here 的正则表达式。

如果您使用 type.convert，您完全不必担心这一点。

df <- type.convert(df, as.is = TRUE)
str(df)

#'data.frame':  4 obs. of  3 variables:
# $ cat1  : chr  "1.12354" "1.4548" "1.9856" "some_string"
# $ cat2  : num  1.46 1.15 1.96 1.32
# $ target: int  0 1 1 0

Answer 2

一个解决方案是编写一个函数，返回要应用于所需列的 NA 值的索引。

check_num <- function(x){
  y <- suppressWarnings(as.numeric(x))
  if(anyNA(y)){
    which(is.na(y))
  } else invisible(NULL)
}
lapply(df[catCols], check_num)
#$cat1
#[1] 4
#
#$cat2
#NULL

上面的函数 returns NULL 如果所有值都可以转换为数字。下一个函数遵循确定哪些矢量元素可以转换的相同方法，但是 returns integer(0) 如果所有元素都可以转换。

check_num2 <- function(x){
  y <- suppressWarnings(as.numeric(x))
  which(is.na(y))
}
lapply(df[catCols], check_num2)
#$cat1
#[1] 4
#
#$cat2
#integer(0)

检查字符向量中的元素是否可以在 R 中转换为数字

Check whether an element in a character vector can be converted to numeric in R

null

r

vector

character