基于 r 中的一列拆分数据框，列的宽度不固定

Question

我有一个问题，它是 SE 上一个覆盖面广的问题的扩展。即：

Split a column of a data frame to multiple columns

我的数据有一个字符串格式的列，以逗号分隔，但没有固定长度。

data = data.frame(id = c(1,2,3), treatments = c("1,2,3", "2,3", "8,9,1,2,4"))

所以我希望我的数据框最终采用正确的 tidy/long 形式：

id    treatments
1     1
1     2
1     3
...
3     1
3     2
3     4

separate 或 strsplit 之类的东西似乎并不是解决方案。 Separate 失败并警告各个列的值过多（NB id 3 的值多于 id 1）。

谢谢

Answer 1

使用 dplyr 和 tidyr 包：

data  %>% 
  separate(treatments, paste0("v", 1:5)) %>% 
  gather(var, treatments, -id) %>% 
  na.exclude %>% 
  select(id, treatments) %>%
  arrange(id)


   id treatments
1   1          1
2   1          2
3   1          3
4   2          2
5   2          3
6   3          8
7   3          9
8   3          1
9   3          2
10  3          4

Answer 2

你也可以使用unnest:

library(tidyverse)
data %>% 
  mutate(treatments = stringr::str_split(treatments, ",")) %>% 
  unnest()

   id treatments
1   1          1
2   1          2
3   1          3
4   2          2
5   2          3
6   3          8
7   3          9
8   3          1
9   3          2
10  3          4

Answer 3

您可以使用 tidyr::separate_rows:

library(tidyr)
separate_rows(data, treatments)

#   id treatments
#1   1          1
#2   1          2
#3   1          3
#4   2          2
#5   2          3
#6   3          8
#7   3          9
#8   3          1
#9   3          2
#10  3          4

基于 r 中的一列拆分数据框，列的宽度不固定

Split dataframe based on one column in r, with a non-fixed width column

r

reshape

dataframe

tidyr