拆分行中的字符串以分隔 R 中的列
split the string in the rows to separate columns in R
我有一列不同的字符串长度,用“,”分隔。我想将此列的每一行拆分为单独的列,并用“NA”填充缺失值,并为每个字符串计算频率数。
这是一个示例:
M <- data.frame(name = c("A", "B", "C"), mapped = c("X1, X3, X4", "X2, X4", "X2,X3, X4"))
name mapped
1 A X1, X3, X4
2 B X2, X4
3 C X2,X3, X4
我想得到像这样的结果数据框:
df <- data.frame(name = c("A","B", "C"), V1 = c("X1","NA", "NA"), V2 = c("NA", "X2","X2"), V3 = c("X3","NA", "X3"), V4 = c("X4","X4", "X4"))
name V1 V2 V3 V4
1 A X1 NA X3 X4
2 B NA X2 NA X4
3 C NA X2 X3 X4
然后计算新数据帧的每一列的X1、X2、X3和X4的数量。
谢谢!
您可以使用 separate_rows
和 pivot_wider
:
library(tidyverse)
M %>%
separate_rows(mapped) %>%
pivot_wider(names_from = mapped, values_from = mapped) %>%
relocate(order(colnames(.)))
# A tibble: 3 x 5
name X1 X2 X3 X4
<chr> <chr> <chr> <chr> <chr>
1 A X1 NA X3 X4
2 B NA X2 NA X4
3 C NA X2 X3 X4
然后计算每列值的数量,使用
:
colSums(!is.na(M[,-1]))
# X1 X2 X3 X4
# 1 2 2 3
以逗号分隔,unlist,然后计数:
table(unlist(strsplit(M$mapped, ",")))
# X1 X2 X3 X4
# 1 2 2 3
我有一列不同的字符串长度,用“,”分隔。我想将此列的每一行拆分为单独的列,并用“NA”填充缺失值,并为每个字符串计算频率数。 这是一个示例:
M <- data.frame(name = c("A", "B", "C"), mapped = c("X1, X3, X4", "X2, X4", "X2,X3, X4"))
name mapped
1 A X1, X3, X4
2 B X2, X4
3 C X2,X3, X4
我想得到像这样的结果数据框:
df <- data.frame(name = c("A","B", "C"), V1 = c("X1","NA", "NA"), V2 = c("NA", "X2","X2"), V3 = c("X3","NA", "X3"), V4 = c("X4","X4", "X4"))
name V1 V2 V3 V4
1 A X1 NA X3 X4
2 B NA X2 NA X4
3 C NA X2 X3 X4
然后计算新数据帧的每一列的X1、X2、X3和X4的数量。
谢谢!
您可以使用 separate_rows
和 pivot_wider
:
library(tidyverse)
M %>%
separate_rows(mapped) %>%
pivot_wider(names_from = mapped, values_from = mapped) %>%
relocate(order(colnames(.)))
# A tibble: 3 x 5
name X1 X2 X3 X4
<chr> <chr> <chr> <chr> <chr>
1 A X1 NA X3 X4
2 B NA X2 NA X4
3 C NA X2 X3 X4
然后计算每列值的数量,使用 :
colSums(!is.na(M[,-1]))
# X1 X2 X3 X4
# 1 2 2 3
以逗号分隔,unlist,然后计数:
table(unlist(strsplit(M$mapped, ",")))
# X1 X2 X3 X4
# 1 2 2 3