转换具有 2 列和行分隔符的 R 数据框

Transforming an R Dataframe with 2 columns and delimiter in rows

我有一个包含两列“id”和“detail”的数据框(df_current 下面)。我需要按id对数据帧进行分组,并展开文件,使列变为“Interface1”、“Interface2”等,接口列下的内容是每次出现接口值时的直接值。本质上是“!”用作分隔符,但输出中不需要它。

所需的输出如下所示:“df_needed_from_current”。

我尝试了多种方法(group_by、传播、重塑、dcast 等),但无法使其发挥作用。任何帮助将不胜感激!

当前数据框示例(创建代码):

id detail
1 !
1 Interface1
1 a
1 b
1 !
1 Interface2
1 a
1 b
2 !
2 Interface1
2 a
2 b
2 c
2 !
2 Interface2
2 a
3 !
3 Interface1
3 a
3 b
3 c
3 d
df_current <- data.frame(
        id = c("1","1","1","1","1","1","1","1","2",
               "2","2","2","2","2","2","2","3","3",
               "3","3","3","3","4","4","4","4","4",
               "4","4","4","4","4","4","4","4","4",
               "5","5","5","5","5","5","5","5","5",
               "5","5","5","5"),
        detail = c("!", "Interface1","a","b","!",
                   "Interface2","a","b","!","Interface1",
                   "a","b","c","!","Interface2","a",
                   "!", "Interface1","a","b","c","d",
                   "!", "Interface1","a","b","!",
                   "Interface2","a","b","c","!","Interface3",
                   "a","b","c","!","Interface1","a","b","!",
                   "Interface2","a","b","c","!","Interface3",
                   "a","b"))

需要数据框(创建代码):

ID Interface1 Interface2 Interface3
1 a a NA
1 b b NA
2 a a NA
2 b NA NA
2 c NA NA
3 a NA NA
3 b NA NA
3 c NA NA
3 d NA NA
df_needed_from_current <- data.frame(
        id = c("1","1","2","2","2","3","3","3","3","4","4","4","5","5","5"),
        Interface1 = c("a","b","a","b","c","a","b","c","d","a","b","NA","a","b","NA"),
        Interface2 = c("a","b","a","NA","NA","NA","NA","NA","NA","a","b","c","a","b","c"),
        Interface3 = c("NA","NA","NA","NA","NA","NA","NA","NA","NA","a","b","c","a","b","NA")
        )

我们删除 'detail' 值为 "!" 的行,然后创建一个新列 'interface',其中的值仅包含来自 'detail' 的前缀 'Interface' ],使用tidyr中的fill,用前面的非NA填充NA元素,filter'detail'值不相同的行'interface' 列,使用 rowid(来自 data.table)创建行序列 ID,并使用 pivot_wider

重塑为 'wide' 格式
library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
df_current %>%
   filter(detail != "!") %>%
   mutate(interface = case_when(str_detect(detail, 'Interface') ~ detail)) %>%
   group_by(id) %>%
   fill(interface) %>%
   ungroup %>%
   filter(detail != interface) %>% 
   mutate(rn = rowid(id, interface)) %>% 
   pivot_wider(names_from = interface, values_from = detail) %>%
   select(-rn)
# A tibble: 15 x 4
#   id    Interface1 Interface2 Interface3
#   <chr> <chr>      <chr>      <chr>     
# 1 1     a          a          <NA>      
# 2 1     b          b          <NA>      
# 3 2     a          a          <NA>      
# 4 2     b          <NA>       <NA>      
# 5 2     c          <NA>       <NA>      
# 6 3     a          <NA>       <NA>      
# 7 3     b          <NA>       <NA>      
# 8 3     c          <NA>       <NA>      
# 9 3     d          <NA>       <NA>      
#10 4     a          a          a         
#11 4     b          b          b         
#12 4     <NA>       c          c         
#13 5     a          a          a         
#14 5     b          b          b         
#15 5     <NA>       c          <NA>