重新排序我的重塑:从长到宽 pivot_wider,不同的列顺序

reordering my reshape: long to wide with pivot_wider, different column order

我需要将一个长数据集(下面的 df)重塑为宽数据集,其中多个变量在给定 ID 的长条目中是相同的,而其他变量则逐行更改。虚拟数据如下:

ID = c("A", "A", "B", "B", "B", "C", "C")
Name = c("mary", "mary", "berry", "berry", "berry", "paul", "paul")
Set = c("set1", "set2", "set1", "set2", "set3", "set1", "set2")
Street = c("123 St", "234 St", "543 St", "492 st", "231 st", "492 st", "231 st")
State = c("al", "nc", "fl", "ca", "md", "tx", "vt")

df = data.frame(ID, Name, Set, Street, State)

我用pivot_wider改造了一下,感觉和我想要的不一样。由于实际数据每个条目有 20 个集合,每个集合有 7 个变量,有没有一种简单的方法可以在重塑时做到这一点?

看起来像这样:

test <- pivot_wider(df, names_from = c("Set"), values_from = c("Street", "State"))
test
# A tibble: 3 x 8
  ID    Name  Street_set1 Street_set2 Street_set3 State_set1 State_set2 State_set3
  <chr> <chr> <chr>       <chr>       <chr>       <chr>      <chr>      <chr>     
1 A     mary  123 St      234 St      NA          al         nc         NA        
2 B     berry 543 St      492 st      231 st      fl         ca         md        
3 C     paul  492 st      231 st      NA          tx         vt         NA        

但我想要的是它看起来像这样:

  ID  Name Set1_Street Set1_State Set2_Street Set2_State Set3_Street Set3State
1  A  mary     123 St        al     234 St        nc       <NA>      <NA>
2  B berry     543 St        fl     492 st        fl    231 st         md
3  C  paul     492 st        tx     231 st        vt       <NA>      <NA>

如果您对此有想法,我也非常喜欢您对哪个选项(重塑、传播)更适合大型数据集的意见!

编辑:遗漏了我使用的 pivot_wider 命令,已修复!哎呀

pivot_wider

中使用 names_glue 可能会更容易
library(dplyr)
library(tidyr)
df %>% 
   pivot_wider(names_from = Set, values_from = c(Street, State), 
       names_glue = "{tools::toTitleCase(Set)}_{.value}") %>%   
   dplyr::select(ID, Name, order(readr::parse_number(names(.)[-(1:2)])) + 2)

-输出

# A tibble: 3 × 8
  ID    Name  Set1_Street Set1_State Set2_Street Set2_State Set3_Street Set3_State
  <chr> <chr> <chr>       <chr>      <chr>       <chr>      <chr>       <chr>     
1 A     mary  123 St      al         234 St      nc         <NA>        <NA>      
2 B     berry 543 St      fl         492 st      ca         231 st      md        
3 C     paul  492 st      tx         231 st      vt         <NA>        <NA>      

按照显示的建议 here in GitHub issue #839, FR: order of columns resulting from pivot_wider,您可以通过手动生成“规范”来解决这些问题。这对我来说听起来比实际更难。

这是您的数据的样子。您首先使用 build_wider_spec() 定义规范,然后按照您想要的列名顺序放置它(使用 arrange() 或类似的东西)。在您的情况下,您想按“设置”订购。您可以看到我输入了 names_glue 来更改列名,但这一步不是必需的。

完成后,使用您创建的 spec 对象在您的数据集上使用 pivot_wider()

library(tidyr)
library(dplyr)

spec <- build_wider_spec(df, names_from = "Set", values_from = c("Street", "State"),
                         names_glue = "{Set}_{.value}")
spec <- arrange(spec, Set,)
spec
#> # A tibble: 6 x 3
#>   .name       .value Set  
#>   <chr>       <chr>  <chr>
#> 1 set1_Street Street set1 
#> 2 set1_State  State  set1 
#> 3 set2_Street Street set2 
#> 4 set2_State  State  set2 
#> 5 set3_Street Street set3 
#> 6 set3_State  State  set3

pivot_wider_spec(df, spec) 
#> # A tibble: 3 x 8
#>   ID    Name  set1_Street set1_State set2_Street set2_State set3_Street
#>   <chr> <chr> <chr>       <chr>      <chr>       <chr>      <chr>      
#> 1 A     mary  123 St      al         234 St      nc         <NA>       
#> 2 B     berry 543 St      fl         492 st      ca         231 st     
#> 3 C     paul  492 st      tx         231 st      vt         <NA>       
#> # ... with 1 more variable: set3_State <chr>

reprex package (v2.0.0)

于 2021-12-17 创建

由于问题839 is finally resolved with the advent of tidyr,你可以直接这样做

library(tidyr)

pivot_wider(df, names_from = c("Set"), values_from = c("Street", "State"), names_vary = 'slowest')

#> # A tibble: 3 x 8
#>   ID    Name  Street_set1 State_set1 Street_set2 State_set2 Street_set3
#>   <chr> <chr> <chr>       <chr>      <chr>       <chr>      <chr>      
#> 1 A     mary  123 St      al         234 St      nc         <NA>       
#> 2 B     berry 543 St      fl         492 st      ca         231 st     
#> 3 C     paul  492 st      tx         231 st      vt         <NA>       
#> # ... with 1 more variable: State_set3 <chr>

数据

ID = c("A", "A", "B", "B", "B", "C", "C")
Name = c("mary", "mary", "berry", "berry", "berry", "paul", "paul")
Set = c("set1", "set2", "set1", "set2", "set3", "set1", "set2")
Street = c("123 St", "234 St", "543 St", "492 st", "231 st", "492 st", "231 st")
State = c("al", "nc", "fl", "ca", "md", "tx", "vt")

df = data.frame(ID, Name, Set, Street, State)

reprex package (v2.0.1)

于 2022-02-18 创建