转换具有 2 列和行分隔符的 R 数据框
Transforming an R Dataframe with 2 columns and delimiter in rows
我有一个包含两列“id”和“detail”的数据框(df_current 下面)。我需要按id对数据帧进行分组,并展开文件,使列变为“Interface1”、“Interface2”等,接口列下的内容是每次出现接口值时的直接值。本质上是“!”用作分隔符,但输出中不需要它。
所需的输出如下所示:“df_needed_from_current”。
我尝试了多种方法(group_by、传播、重塑、dcast 等),但无法使其发挥作用。任何帮助将不胜感激!
当前数据框示例(创建代码):
id
detail
1
!
1
Interface1
1
a
1
b
1
!
1
Interface2
1
a
1
b
2
!
2
Interface1
2
a
2
b
2
c
2
!
2
Interface2
2
a
3
!
3
Interface1
3
a
3
b
3
c
3
d
df_current <- data.frame(
id = c("1","1","1","1","1","1","1","1","2",
"2","2","2","2","2","2","2","3","3",
"3","3","3","3","4","4","4","4","4",
"4","4","4","4","4","4","4","4","4",
"5","5","5","5","5","5","5","5","5",
"5","5","5","5"),
detail = c("!", "Interface1","a","b","!",
"Interface2","a","b","!","Interface1",
"a","b","c","!","Interface2","a",
"!", "Interface1","a","b","c","d",
"!", "Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b","c","!","Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b"))
需要数据框(创建代码):
ID
Interface1
Interface2
Interface3
1
a
a
NA
1
b
b
NA
2
a
a
NA
2
b
NA
NA
2
c
NA
NA
3
a
NA
NA
3
b
NA
NA
3
c
NA
NA
3
d
NA
NA
df_needed_from_current <- data.frame(
id = c("1","1","2","2","2","3","3","3","3","4","4","4","5","5","5"),
Interface1 = c("a","b","a","b","c","a","b","c","d","a","b","NA","a","b","NA"),
Interface2 = c("a","b","a","NA","NA","NA","NA","NA","NA","a","b","c","a","b","c"),
Interface3 = c("NA","NA","NA","NA","NA","NA","NA","NA","NA","a","b","c","a","b","NA")
)
我们删除 'detail' 值为 "!"
的行,然后创建一个新列 'interface',其中的值仅包含来自 'detail' 的前缀 'Interface' ],使用tidyr
中的fill
,用前面的非NA填充NA
元素,filter
'detail'值不相同的行'interface' 列,使用 rowid
(来自 data.table
)创建行序列 ID,并使用 pivot_wider
重塑为 'wide' 格式
library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
df_current %>%
filter(detail != "!") %>%
mutate(interface = case_when(str_detect(detail, 'Interface') ~ detail)) %>%
group_by(id) %>%
fill(interface) %>%
ungroup %>%
filter(detail != interface) %>%
mutate(rn = rowid(id, interface)) %>%
pivot_wider(names_from = interface, values_from = detail) %>%
select(-rn)
# A tibble: 15 x 4
# id Interface1 Interface2 Interface3
# <chr> <chr> <chr> <chr>
# 1 1 a a <NA>
# 2 1 b b <NA>
# 3 2 a a <NA>
# 4 2 b <NA> <NA>
# 5 2 c <NA> <NA>
# 6 3 a <NA> <NA>
# 7 3 b <NA> <NA>
# 8 3 c <NA> <NA>
# 9 3 d <NA> <NA>
#10 4 a a a
#11 4 b b b
#12 4 <NA> c c
#13 5 a a a
#14 5 b b b
#15 5 <NA> c <NA>
我有一个包含两列“id”和“detail”的数据框(df_current 下面)。我需要按id对数据帧进行分组,并展开文件,使列变为“Interface1”、“Interface2”等,接口列下的内容是每次出现接口值时的直接值。本质上是“!”用作分隔符,但输出中不需要它。
所需的输出如下所示:“df_needed_from_current”。
我尝试了多种方法(group_by、传播、重塑、dcast 等),但无法使其发挥作用。任何帮助将不胜感激!
当前数据框示例(创建代码):
id | detail |
---|---|
1 | ! |
1 | Interface1 |
1 | a |
1 | b |
1 | ! |
1 | Interface2 |
1 | a |
1 | b |
2 | ! |
2 | Interface1 |
2 | a |
2 | b |
2 | c |
2 | ! |
2 | Interface2 |
2 | a |
3 | ! |
3 | Interface1 |
3 | a |
3 | b |
3 | c |
3 | d |
df_current <- data.frame(
id = c("1","1","1","1","1","1","1","1","2",
"2","2","2","2","2","2","2","3","3",
"3","3","3","3","4","4","4","4","4",
"4","4","4","4","4","4","4","4","4",
"5","5","5","5","5","5","5","5","5",
"5","5","5","5"),
detail = c("!", "Interface1","a","b","!",
"Interface2","a","b","!","Interface1",
"a","b","c","!","Interface2","a",
"!", "Interface1","a","b","c","d",
"!", "Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b","c","!","Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b"))
需要数据框(创建代码):
ID | Interface1 | Interface2 | Interface3 |
---|---|---|---|
1 | a | a | NA |
1 | b | b | NA |
2 | a | a | NA |
2 | b | NA | NA |
2 | c | NA | NA |
3 | a | NA | NA |
3 | b | NA | NA |
3 | c | NA | NA |
3 | d | NA | NA |
df_needed_from_current <- data.frame(
id = c("1","1","2","2","2","3","3","3","3","4","4","4","5","5","5"),
Interface1 = c("a","b","a","b","c","a","b","c","d","a","b","NA","a","b","NA"),
Interface2 = c("a","b","a","NA","NA","NA","NA","NA","NA","a","b","c","a","b","c"),
Interface3 = c("NA","NA","NA","NA","NA","NA","NA","NA","NA","a","b","c","a","b","NA")
)
我们删除 'detail' 值为 "!"
的行,然后创建一个新列 'interface',其中的值仅包含来自 'detail' 的前缀 'Interface' ],使用tidyr
中的fill
,用前面的非NA填充NA
元素,filter
'detail'值不相同的行'interface' 列,使用 rowid
(来自 data.table
)创建行序列 ID,并使用 pivot_wider
library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
df_current %>%
filter(detail != "!") %>%
mutate(interface = case_when(str_detect(detail, 'Interface') ~ detail)) %>%
group_by(id) %>%
fill(interface) %>%
ungroup %>%
filter(detail != interface) %>%
mutate(rn = rowid(id, interface)) %>%
pivot_wider(names_from = interface, values_from = detail) %>%
select(-rn)
# A tibble: 15 x 4
# id Interface1 Interface2 Interface3
# <chr> <chr> <chr> <chr>
# 1 1 a a <NA>
# 2 1 b b <NA>
# 3 2 a a <NA>
# 4 2 b <NA> <NA>
# 5 2 c <NA> <NA>
# 6 3 a <NA> <NA>
# 7 3 b <NA> <NA>
# 8 3 c <NA> <NA>
# 9 3 d <NA> <NA>
#10 4 a a a
#11 4 b b b
#12 4 <NA> c c
#13 5 a a a
#14 5 b b b
#15 5 <NA> c <NA>