拆分行中的字符串以分隔 R 中的列

split the string in the rows to separate columns in R

我有一列不同的字符串长度,用“,”分隔。我想将此列的每一行拆分为单独的列,并用“NA”填充缺失值,并为每个字符串计算频率数。 这是一个示例:

M <- data.frame(name = c("A", "B", "C"), mapped = c("X1, X3, X4", "X2, X4", "X2,X3, X4"))
  name     mapped
1    A X1, X3, X4
2    B     X2, X4
3    C  X2,X3, X4

我想得到像这样的结果数据框:

df <- data.frame(name = c("A","B", "C"), V1 = c("X1","NA", "NA"), V2 = c("NA", "X2","X2"), V3 = c("X3","NA", "X3"), V4 = c("X4","X4", "X4"))

  name V1 V2 V3 V4
1    A X1 NA X3 X4
2    B NA X2 NA X4
3    C NA X2 X3 X4

然后计算新数据帧的每一列的X1、X2、X3和X4的数量。

谢谢!

您可以使用 separate_rowspivot_wider:

library(tidyverse)

M %>% 
  separate_rows(mapped) %>% 
  pivot_wider(names_from = mapped, values_from = mapped) %>% 
  relocate(order(colnames(.)))

# A tibble: 3 x 5
  name  X1    X2    X3    X4   
  <chr> <chr> <chr> <chr> <chr>
1 A     X1    NA    X3    X4   
2 B     NA    X2    NA    X4   
3 C     NA    X2    X3    X4   

然后计算每列值的数量,使用 :

colSums(!is.na(M[,-1]))
# X1 X2 X3 X4 
#  1  2  2  3

以逗号分隔,unlist,然后计数:

table(unlist(strsplit(M$mapped, ",")))
# X1 X2 X3 X4 
#  1  2  2  3