如何将多个 data.frames 的多个列强制为 R 中的字符？

Question

我想将多个 data.frames 的所有列强制转换为 character，以便稍后 rbind。问题是我无法创建适当的函数来在 lapply.

中使用它

# Fake dataset
set.seed(123)
A = as.data.frame(matrix(sample(c('NA',1:10),10*10,T),10))
B = as.data.frame(matrix(sample(c('NA',LETTERS[1:10]),10*10,T),10))
C = as.data.frame(matrix(sample(c('NA',letters[1:10]),10*10,T),10))

原则上，这个任务应该很简单：

target = list(A, B, C)
lapply(target, function(x) {
  x <- as.character(x)
}) -> df

但是，当我运行 str(df) 时，我得到了这个：

List of 3
 $ : chr [1:10] "c(\"2\", \"2\", \"9\", \"1\", \"5\", \"10\", \"4\", \"3\", \"5\", \"8\")" "c(\"9\", \"10\", \"4\", \"2\", \"10\", \"8\", \"8\", \"8\", \"2\", \"7\")" "c(\"9\", \"6\", \"9\", \"8\", \"2\", \"3\", \"NA\", \"10\", \"6\", \"4\")" "c(\"9\", \"6\", \"8\", \"8\", \"9\", \"6\", \"10\", \"4\", \"6\", \"4\")" ...
 $ : chr [1:10] "c(\"G\", \"B\", \"G\", \"NA\", \"F\", \"J\", \"F\", \"F\", \"I\", \"E\")" "c(\"F\", \"J\", \"I\", \"D\", \"E\", \"G\", \"D\", \"F\", \"J\", \"C\")" "c(\"B\", \"H\", \"F\", \"E\", \"I\", \"H\", \"F\", \"A\", \"B\", \"G\")" "c(\"C\", \"F\", \"C\", \"NA\", \"G\", \"C\", \"H\", \"G\", \"E\", \"J\")" ...
 $ : chr [1:10] "c(\"h\", \"h\", \"d\", \"f\", \"e\", \"j\", \"NA\", \"i\", \"i\", \"NA\")" "c(\"i\", \"

下一次尝试是：

lapply(target, function(x,i) {
    x[,i] <- as.character(x[,i])
return(x)}) -> df

这 returns 3 data.frames 符合预期，但 str 不是我想要的（第一个 data.frame 的部分输出）：

 $ :'data.frame':       10 obs. of  10 variables:
  ..$ V1 : chr [1:10] "c(\"2\", \"2\", \"9\", \"1\", \"5\", \"10\", \"4\", \"3\", \"5\", \"8\")" "c(\"9\", \"10\", \"4\", \"2\", \"10\", \"8\", \"8\", \"8\", \"2\", \"7\")" "c(\"9\", \"6\", \"9\", \"8\", \"2\", \"3\", \"NA\", \"10\", \"6\", \"4\")" "c(\"9\", \"6\", \"8\", \"8\", \"9\", \"6\", \"10\", \"4\", \"6\", \"4\")" ...
  ..$ V2 : chr [1:10] "c(\"2\", \"2\", \"9\", \"1\", \"5\", \"10\", \"4\", \"3\", \"5\", \"8\")" "c(\"9\", \"10\", \"4\", \"2\", \"10\", \"8\", \"8\", \"8\", \"2\", \"7\")" "c(\"9\", \"6\", \"9\", \"8\", \"2\", \"3\", \"NA\", \"10\", \"6\", \"4\")" "c(\"9\", \"6\", \"8\", \"8\", \"9\", \"6\", \"10\", \"4\", \"6\", \"4\")" ...
  ..$ V3 : chr [1:10] "c(\"2\", \"2\", \"9\", \"1\", \"5\", \"10\", \"4\", \"3\", \"5\", \"8\")" "c(\"9\", \"10\", \"4\", \"2\", \"10\", \"8\", \"8\", \"8\", \"2\", \"7\")" "c(\"9\", \"6\", \"9\", \"8\", \"2\", \"3\", \"NA\", \"10\", \"6\", \"4\")" "c(\"9\", \"6\", \"8\", \"8\", \"9\", \"6\", \"10\", \"4\", \"6\", \"4\")" ...
  ..$ V4 : chr [1:10] "c(\"2\", \"2\", \"9\", \"1\", \"5\", \"10\", \"4\", \"3\", \"5\

所以基本上我被困住了，我不知道我还能做什么，所以任何建议将不胜感激。

Answer 1

您必须将每个 data.frame 的列逐一转换为字符。

lapply(target, function(x) {
  x[] <- lapply(x, as.character)
  x
}) -> target

df <- do.call(rbind, target)

Answer 2

有了tidyverse你可以做到

library(purrr)
library(dplyr)

target %>%
   map(~ mutate(., across(everything(), as.character))) %>%
   bind_rows()

得到你的最终data.frame

   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   2  9  9  9 10  8  5  2  2   9
2   2 10  6  6  5  5  6  8  6   1
3   9  4  9  8  8  4 NA  3  2   9
4   1  2  8  8  1  8  5  5  6   1
5   5 10  2  9  4  9  1  8  5   9
6  10  8  3  6  7  3 NA  8  9   5
7   4  8 NA 10  1  5  1  6  4   3
8   3  8 10  4 NA 10  3  2  4  NA
9   5  2  6  6  8  7  4  7  7   5
10  8  7  4  4 10  5  5  8  2   2
11  G  F  B  C  C  J  I  A  I   H
12  B  J  H  F  G  H  C  D  J   C
13  G  I  F  C  B  F  D  H NA   I
14 NA  D  E NA  C  G  F  F  H   F
15  F  E  I  G  C  D NA  F  B   H
16  J  G  H  C  E  A  G  I  I   F
17  F  D  F  H NA  J  G  I  D   I
18  F  F  A  G  I  E  I  G  E   C
19  I  J  B  E  J  H  H  E  J   G
20  E  C  G  J  C  G  G  F  J   H
21  h  i  e  c  a  f  b  b  h   d
22  h NA  a  i  i  c  a  c  f   j
23  d  i NA NA  e  a  a  j  h  NA
24  f  j  d  d  f NA  d  j  c   b
25  e  d  h  d  h NA  h  b  a   a
26  j  f  c  h NA  a  i  f  e   d
27 NA  d  b  g  d NA  e  b  i  NA
28  i  i  h  f  d  a  i  a  h   b
29  i  h NA  h  g  g  i  f  j   h
30 NA  c  a  d  d NA  e  b  f   a

如何将多个 data.frames 的多个列强制为 R 中的字符？

How to coerce multiple columns for multiple data.frames as character in R?

r

coercion

lapply

dataframe