如何在数据集上旋转更宽的多列并保持特定的列顺序?
How to pivot wider multiple columns on dataset and maintain a specific colum order?
我的初始数据集
df1 <- structure(list(id = c(1, 1, 2, 3, 3, 3),
name = c("james", "james", "peter", "anne", "anne", "anne"),
trip_id = c(10,11,10,30,11,32),
date = c("2021/01/01", "2021/06/01","2021/08/01","2021/10/01","2021/10/21","2021/12/01"),
cost = c(100,150,3000,1200,1100,5000)
),
row.names = c(NA,-6L),
class = c("tbl_df", "tbl", "data.frame"))
我需要扩大每次旅行的日期和费用,以便它们成对出现。我想我更接近了,但会感谢您的反馈。
我当前的代码
df2= df1 %>% pivot_wider(names_from = trip_id,
values_from = c(date, cost))
我想要的结果
df2 <- structure(list(id = c(1, 2, 3),
name = c("james", "peter", "anne"),
date_10 = c("2021/01/01","2021/08/01",NA),
cost_10 = c(100,3000,NA),
date_11 = c("2021/06/01",NA,"2021/10/21"),
cost_11 = c(150,NA,1100),
date_30 = c(NA,NA,"2021/10/01"),
cost_30 = c(NA,NA,1200),
date_32 = c(NA,NA,"2021/12/01"),
cost_32 = c(NA,NA,5000)
),
row.names = c(NA,-3L),
class = c("tbl_df", "tbl", "data.frame"))
看起来你们很亲密。我们在 pivot_wider
之前使用 trip_id
来帮助重新排序列。根据您想要的结果,您可能需要也可能不需要 sort
。如果你只想要成对的,就没有必要排序。
library(tidyverse)
nums <- sort(unique(df1$trip_id))
nums <- as.character(nums)
df2 <-
df1 %>%
pivot_wider(names_from = trip_id,
values_from = c(date, cost)) %>%
select(id, name, ends_with(nums))
df2
#> # A tibble: 3 x 10
#> id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
#> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 1 james 2021/01/01 100 2021/0~ 150 <NA> NA <NA> NA
#> 2 2 peter 2021/08/01 3000 <NA> NA <NA> NA <NA> NA
#> 3 3 anne <NA> NA 2021/1~ 1100 2021/1~ 1200 2021/1~ 5000
为此,在将数据形状更改为宽格式之前,您确实需要一个额外的 pivot_longer
步骤。请注意,我在 pivot_longer
中使用 values_transform
参数将 cost
的 class 更改为字符,以便我可以将它与中间 [=16] 中的 date
组合=]变量:
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(c(date, cost), names_to = "var",
values_to = "val",
values_transform = list(val = as.character)) %>%
pivot_wider(names_from = c(var, trip_id), values_from = val) %>%
mutate(across(starts_with("cost"), as.double))
# A tibble: 3 x 10
id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
<dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 1 james 2021/01/01 100 2021/06/01 150 NA NA NA NA
2 2 peter 2021/08/01 3000 NA NA NA NA NA NA
3 3 anne NA NA 2021/10/21 1100 2021/10/01 1200 2021/12/01 5000
你的方法是正确的。列 numbering/Row 编号不应影响数据,除非您有时间 series/panel 数据和一些例外情况。
否则,您可以使用 reshape
函数完成相同的操作,它会为您提供所需的内容。标记你这是 Base R:
reshape(data.frame(df1), timevar = 'trip_id', idvar = c('id', 'name'), dir='wide', sep = '_')
id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
1 1 james 2021/01/01 100 2021/06/01 150 <NA> NA <NA> NA
3 2 peter 2021/08/01 3000 <NA> NA <NA> NA <NA> NA
4 3 anne <NA> NA 2021/10/21 1100 2021/10/01 1200 2021/12/01 5000
我的初始数据集
df1 <- structure(list(id = c(1, 1, 2, 3, 3, 3),
name = c("james", "james", "peter", "anne", "anne", "anne"),
trip_id = c(10,11,10,30,11,32),
date = c("2021/01/01", "2021/06/01","2021/08/01","2021/10/01","2021/10/21","2021/12/01"),
cost = c(100,150,3000,1200,1100,5000)
),
row.names = c(NA,-6L),
class = c("tbl_df", "tbl", "data.frame"))
我需要扩大每次旅行的日期和费用,以便它们成对出现。我想我更接近了,但会感谢您的反馈。
我当前的代码
df2= df1 %>% pivot_wider(names_from = trip_id,
values_from = c(date, cost))
我想要的结果
df2 <- structure(list(id = c(1, 2, 3),
name = c("james", "peter", "anne"),
date_10 = c("2021/01/01","2021/08/01",NA),
cost_10 = c(100,3000,NA),
date_11 = c("2021/06/01",NA,"2021/10/21"),
cost_11 = c(150,NA,1100),
date_30 = c(NA,NA,"2021/10/01"),
cost_30 = c(NA,NA,1200),
date_32 = c(NA,NA,"2021/12/01"),
cost_32 = c(NA,NA,5000)
),
row.names = c(NA,-3L),
class = c("tbl_df", "tbl", "data.frame"))
看起来你们很亲密。我们在 pivot_wider
之前使用 trip_id
来帮助重新排序列。根据您想要的结果,您可能需要也可能不需要 sort
。如果你只想要成对的,就没有必要排序。
library(tidyverse)
nums <- sort(unique(df1$trip_id))
nums <- as.character(nums)
df2 <-
df1 %>%
pivot_wider(names_from = trip_id,
values_from = c(date, cost)) %>%
select(id, name, ends_with(nums))
df2
#> # A tibble: 3 x 10
#> id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
#> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 1 james 2021/01/01 100 2021/0~ 150 <NA> NA <NA> NA
#> 2 2 peter 2021/08/01 3000 <NA> NA <NA> NA <NA> NA
#> 3 3 anne <NA> NA 2021/1~ 1100 2021/1~ 1200 2021/1~ 5000
为此,在将数据形状更改为宽格式之前,您确实需要一个额外的 pivot_longer
步骤。请注意,我在 pivot_longer
中使用 values_transform
参数将 cost
的 class 更改为字符,以便我可以将它与中间 [=16] 中的 date
组合=]变量:
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(c(date, cost), names_to = "var",
values_to = "val",
values_transform = list(val = as.character)) %>%
pivot_wider(names_from = c(var, trip_id), values_from = val) %>%
mutate(across(starts_with("cost"), as.double))
# A tibble: 3 x 10
id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
<dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 1 james 2021/01/01 100 2021/06/01 150 NA NA NA NA
2 2 peter 2021/08/01 3000 NA NA NA NA NA NA
3 3 anne NA NA 2021/10/21 1100 2021/10/01 1200 2021/12/01 5000
你的方法是正确的。列 numbering/Row 编号不应影响数据,除非您有时间 series/panel 数据和一些例外情况。
否则,您可以使用 reshape
函数完成相同的操作,它会为您提供所需的内容。标记你这是 Base R:
reshape(data.frame(df1), timevar = 'trip_id', idvar = c('id', 'name'), dir='wide', sep = '_')
id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
1 1 james 2021/01/01 100 2021/06/01 150 <NA> NA <NA> NA
3 2 peter 2021/08/01 3000 <NA> NA <NA> NA <NA> NA
4 3 anne <NA> NA 2021/10/21 1100 2021/10/01 1200 2021/12/01 5000