在解析来自列名的信息并使用它从特定列收集信息时,在 R 中重塑 table
Reshaping a table in R while parsing information from column names and using it to collect information from specific columns
我收到了这个组织不当的数据 table,其中有数百列(下面给出了子集)
列名称以点分隔,其中第一个字段包含有关对象类型的信息(例如 Item123、object_AB 等),没有任何命名约定。这些列也没有特定的顺序。
其他列共享对象字段的类型,并且还具有该对象的一些 属性 名称(例如颜色、制造商等)。
Item123.type.value Item123.mass.value Item123.color.value object_AB.type.value object_AB.mass.value object_AB.color.value
Desk 11.2 blue Chair 2.3 orange
Desk 14.2 red Sofa 22 grey
Armchair 23.3 black Monitor 2.2 white
已编辑:添加 dput() 结构:
structure(list(Item123.type.value = structure(c(2L, 2L, 1L),
levels = c("Armchair", "Desk"), class = "factor"), Item123.mass.value = structure(1:3,
levels = c("11.2", "14.2", "23.3"), class = "factor"), Item123.color.value = structure(c(2L,
3L, 1L), levels = c("black", "blue", "red"), class = "factor"),
object_AB.type.value = structure(c(1L, 3L, 2L), levels = c("Chair",
"Monitor", "Sofa"), class = "factor"), object_AB.mass.value = structure(c(2L,
3L, 1L), levels = c("2.2", "2.3", "22"), class = "factor"),
object_AB.color.value = structure(c(2L, 1L, 3L), levels = c("grey",
"orange", "white"), class = "factor")), row.names = c(NA_integer_,
-3L), class = "data.frame")
我需要将 table 转换成这样(行的顺序无关紧要):
type name mass color
Item123 Desk 11.2 blue
Item123 Desk 14.2 red
object_AB Chair 2.3 orange
object_AB Sofa 22 grey
Item123 Armchair 23.3 black
object_AB Monitor 2.2 white
如果能得到任何帮助,我将不胜感激!
我建议使用这种方法,使用您添加的数据作为 df
可能会耗时最长且乏味。该代码在您的列名中查找特定模式,重塑它并最终合并所有:
library(tidyverse)
#Code
df %>% select(contains('type')) %>%
mutate(id=1:n()) %>%
pivot_longer(-id) %>%
separate(name,into = c(paste0('V',1:3)),sep = '\.') %>%
select(-c(V2,V3)) %>%
rename(Value1=value) %>%
left_join(df %>% select(contains('mass')) %>%
mutate(id=1:n()) %>%
pivot_longer(-id) %>%
separate(name,into = c(paste0('V',1:3)),sep = '\.') %>%
select(-c(V2,V3)) %>%
rename(Value2=value)) %>%
left_join(df %>% select(contains('color')) %>%
mutate(id=1:n()) %>%
pivot_longer(-id) %>%
separate(name,into = c(paste0('V',1:3)),sep = '\.') %>%
select(-c(V2,V3)) %>%
rename(Value3=value))
输出:
# A tibble: 6 x 5
id V1 Value1 Value2 Value3
<int> <chr> <chr> <dbl> <chr>
1 1 Item123 Desk 11.2 blue
2 1 object_AB Chair 2.3 orange
3 2 Item123 Desk 14.2 red
4 2 object_AB Sofa 22 grey
5 3 Item123 Armchair 23.3 black
6 3 object_AB Monitor 2.2 white
您可以在此处使用 pivot_longer
指定 names_pattern
从列名中获取数据。
tidyr::pivot_longer(df,
cols = everything(),
names_to = c('name', '.value'),
names_pattern = '(\w+)\.(\w+)\.')
# A tibble: 6 x 4
# name type mass color
# <chr> <fct> <fct> <fct>
#1 Item123 Desk 11.2 blue
#2 object_AB Chair 2.3 orange
#3 Item123 Desk 14.2 red
#4 object_AB Sofa 22 grey
#5 Item123 Armchair 23.3 black
#6 object_AB Monitor 2.2 white
我收到了这个组织不当的数据 table,其中有数百列(下面给出了子集)
列名称以点分隔,其中第一个字段包含有关对象类型的信息(例如 Item123、object_AB 等),没有任何命名约定。这些列也没有特定的顺序。 其他列共享对象字段的类型,并且还具有该对象的一些 属性 名称(例如颜色、制造商等)。
Item123.type.value Item123.mass.value Item123.color.value object_AB.type.value object_AB.mass.value object_AB.color.value
Desk 11.2 blue Chair 2.3 orange
Desk 14.2 red Sofa 22 grey
Armchair 23.3 black Monitor 2.2 white
已编辑:添加 dput() 结构:
structure(list(Item123.type.value = structure(c(2L, 2L, 1L),
levels = c("Armchair", "Desk"), class = "factor"), Item123.mass.value = structure(1:3,
levels = c("11.2", "14.2", "23.3"), class = "factor"), Item123.color.value = structure(c(2L,
3L, 1L), levels = c("black", "blue", "red"), class = "factor"),
object_AB.type.value = structure(c(1L, 3L, 2L), levels = c("Chair",
"Monitor", "Sofa"), class = "factor"), object_AB.mass.value = structure(c(2L,
3L, 1L), levels = c("2.2", "2.3", "22"), class = "factor"),
object_AB.color.value = structure(c(2L, 1L, 3L), levels = c("grey",
"orange", "white"), class = "factor")), row.names = c(NA_integer_,
-3L), class = "data.frame")
我需要将 table 转换成这样(行的顺序无关紧要):
type name mass color
Item123 Desk 11.2 blue
Item123 Desk 14.2 red
object_AB Chair 2.3 orange
object_AB Sofa 22 grey
Item123 Armchair 23.3 black
object_AB Monitor 2.2 white
如果能得到任何帮助,我将不胜感激!
我建议使用这种方法,使用您添加的数据作为 df
可能会耗时最长且乏味。该代码在您的列名中查找特定模式,重塑它并最终合并所有:
library(tidyverse)
#Code
df %>% select(contains('type')) %>%
mutate(id=1:n()) %>%
pivot_longer(-id) %>%
separate(name,into = c(paste0('V',1:3)),sep = '\.') %>%
select(-c(V2,V3)) %>%
rename(Value1=value) %>%
left_join(df %>% select(contains('mass')) %>%
mutate(id=1:n()) %>%
pivot_longer(-id) %>%
separate(name,into = c(paste0('V',1:3)),sep = '\.') %>%
select(-c(V2,V3)) %>%
rename(Value2=value)) %>%
left_join(df %>% select(contains('color')) %>%
mutate(id=1:n()) %>%
pivot_longer(-id) %>%
separate(name,into = c(paste0('V',1:3)),sep = '\.') %>%
select(-c(V2,V3)) %>%
rename(Value3=value))
输出:
# A tibble: 6 x 5
id V1 Value1 Value2 Value3
<int> <chr> <chr> <dbl> <chr>
1 1 Item123 Desk 11.2 blue
2 1 object_AB Chair 2.3 orange
3 2 Item123 Desk 14.2 red
4 2 object_AB Sofa 22 grey
5 3 Item123 Armchair 23.3 black
6 3 object_AB Monitor 2.2 white
您可以在此处使用 pivot_longer
指定 names_pattern
从列名中获取数据。
tidyr::pivot_longer(df,
cols = everything(),
names_to = c('name', '.value'),
names_pattern = '(\w+)\.(\w+)\.')
# A tibble: 6 x 4
# name type mass color
# <chr> <fct> <fct> <fct>
#1 Item123 Desk 11.2 blue
#2 object_AB Chair 2.3 orange
#3 Item123 Desk 14.2 red
#4 object_AB Sofa 22 grey
#5 Item123 Armchair 23.3 black
#6 object_AB Monitor 2.2 white