如何在数据集上旋转更宽的多列并保持特定的列顺序?

How to pivot wider multiple columns on dataset and maintain a specific colum order?

我的初始数据集

df1 <- structure(list(id = c(1, 1, 2, 3, 3, 3), 
                      name = c("james", "james", "peter", "anne", "anne", "anne"), 
                      trip_id = c(10,11,10,30,11,32),
                      date = c("2021/01/01", "2021/06/01","2021/08/01","2021/10/01","2021/10/21","2021/12/01"),
                      cost = c(100,150,3000,1200,1100,5000)
                      
), 
row.names = c(NA,-6L), 
class = c("tbl_df", "tbl", "data.frame"))

我需要扩大每次旅行的日期和费用,以便它们成对出现。我想我更接近了,但会感谢您的反馈。

我当前的代码

df2= df1 %>% pivot_wider(names_from = trip_id, 
                               values_from = c(date, cost))

我想要的结果

df2 <- structure(list(id = c(1, 2, 3), 
                      name = c("james", "peter", "anne"), 
                      date_10 = c("2021/01/01","2021/08/01",NA),
                      cost_10 = c(100,3000,NA),
                      date_11 = c("2021/06/01",NA,"2021/10/21"),
                      cost_11 = c(150,NA,1100),
                      date_30 = c(NA,NA,"2021/10/01"),
                      cost_30 = c(NA,NA,1200),
                      date_32 = c(NA,NA,"2021/12/01"),
                      cost_32 = c(NA,NA,5000)             
                      
), 
row.names = c(NA,-3L), 
class = c("tbl_df", "tbl", "data.frame"))

看起来你们很亲密。我们在 pivot_wider 之前使用 trip_id 来帮助重新排序列。根据您想要的结果,您可能需要也可能不需要 sort。如果你只想要成对的,就没有必要排序。

library(tidyverse)
nums <- sort(unique(df1$trip_id))
nums <- as.character(nums)

df2 <- 
  df1 %>% 
    pivot_wider(names_from = trip_id, 
                values_from = c(date, cost)) %>%
    select(id, name, ends_with(nums))

df2
#> # A tibble: 3 x 10
#>      id name  date_10    cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
#>   <dbl> <chr> <chr>        <dbl> <chr>     <dbl> <chr>     <dbl> <chr>     <dbl>
#> 1     1 james 2021/01/01     100 2021/0~     150 <NA>         NA <NA>         NA
#> 2     2 peter 2021/08/01    3000 <NA>         NA <NA>         NA <NA>         NA
#> 3     3 anne  <NA>            NA 2021/1~    1100 2021/1~    1200 2021/1~    5000

为此,在将数据形状更改为宽格式之前,您确实需要一个额外的 pivot_longer 步骤。请注意,我在 pivot_longer 中使用 values_transform 参数将 cost 的 class 更改为字符,以便我可以将它与中间 [=16] 中的 date 组合=]变量:

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(c(date, cost), names_to = "var", 
               values_to = "val", 
               values_transform = list(val = as.character)) %>%
  pivot_wider(names_from = c(var, trip_id), values_from = val) %>%
  mutate(across(starts_with("cost"), as.double))


# A tibble: 3 x 10
     id name  date_10    cost_10 date_11    cost_11 date_30    cost_30 date_32    cost_32
  <dbl> <chr> <chr>        <dbl> <chr>        <dbl> <chr>        <dbl> <chr>        <dbl>
1     1 james 2021/01/01     100 2021/06/01     150 NA              NA NA              NA
2     2 peter 2021/08/01    3000 NA              NA NA              NA NA              NA
3     3 anne  NA              NA 2021/10/21    1100 2021/10/01    1200 2021/12/01    5000

你的方法是正确的。列 numbering/Row 编号不应影响数据,除非您有时间 series/panel 数据和一些例外情况。

否则,您可以使用 reshape 函数完成相同的操作,它会为您提供所需的内容。标记你这是 Base R:

reshape(data.frame(df1), timevar = 'trip_id', idvar = c('id', 'name'), dir='wide', sep = '_')

  id  name    date_10 cost_10    date_11 cost_11    date_30 cost_30    date_32 cost_32
1  1 james 2021/01/01     100 2021/06/01     150       <NA>      NA       <NA>      NA
3  2 peter 2021/08/01    3000       <NA>      NA       <NA>      NA       <NA>      NA
4  3  anne       <NA>      NA 2021/10/21    1100 2021/10/01    1200 2021/12/01    5000