R数据集从长到宽——特定条件下

R dataset from long to wide - under a specific condition

我想通过 ID 将一个长的按时间顺序排列的数据集转换为一个宽但按时间顺序排列的数据集 让我们看一个例子:

ID Product Date
1 Bike 1/1/2000
1 Tire 2/1/2000
2 Car 15/2/2000
2 Seat 17/2/2000
1 Chronometer 20/2/2000

进入以下table:

ID 1st 2nd 3rd etc
1 Bike Tire Chronometer
2 Car Seat

购买的商品顺序不得更改。

你们能帮帮我吗?

非常感谢!

arrange 每个 IDDate 的数据,为每个 ID 提供唯一的行号并将数据转换为宽格式。

library(dplyr)

df %>%
  mutate(Date = as.Date(Date, '%d/%m/%Y')) %>%
  arrange(ID, Date) %>%
  group_by(ID) %>%
  mutate(row = row_number()) %>%
  tidyr::pivot_wider(names_from = row, values_from = c(Product, Date))

#     ID Product_1 Product_2 Product_3   Date_1     Date_2     Date_3    
#  <int> <chr>     <chr>     <chr>       <date>     <date>     <date>    
#1     1 Bike      Tire      Chronometer 2000-01-01 2000-01-02 2000-02-20
#2     2 Car       Seat      NA          2000-02-15 2000-02-17 NA        

数据

df <- structure(list(ID = c(1L, 1L, 2L, 2L, 1L), Product = c("Bike", 
"Tire", "Car", "Seat", "Chronometer"), Date = c("1/1/2000", "2/1/2000", 
"15/2/2000", "17/2/2000", "20/2/2000")), class = "data.frame", row.names = c(NA, -5L))

使用 reshape

的基础 R 选项
reshape(
  transform(
    df,
    q = ave(1:nrow(df), ID, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "ID",
  timevar = "q"
)

给予

  ID Product.1    Date.1 Product.2    Date.2   Product.3    Date.3
1  1      Bike  1/1/2000      Tire  2/1/2000 Chronometer 20/2/2000
3  2       Car 15/2/2000      Seat 17/2/2000        <NA>      <NA>

如果不想保留Date,可以试试这个

reshape(
  transform(
    subset(df, select = -Date),
    q = ave(1:nrow(df), ID, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "ID",
  timevar = "q"
)

这给出了

  ID Product.1 Product.2   Product.3
1  1      Bike      Tire Chronometer
3  2       Car      Seat        <NA>

数据

> dput(df)
structure(list(ID = c(1L, 1L, 2L, 2L, 1L), Product = c("Bike", 
"Tire", "Car", "Seat", "Chronometer"), Date = c("1/1/2000", "2/1/2000",
"15/2/2000", "17/2/2000", "20/2/2000")), class = "data.frame", row.names = c(NA,
-5L))

我们可以使用 dcast 来自 data.table

library(data.table)
dcast(setDT(df), ID ~ rowid(ID), value.var = c('Product', 'Date'))
#     ID Product_1 Product_2   Product_3    Date_1    Date_2    Date_3
#1:  1      Bike      Tire Chronometer  1/1/2000  2/1/2000 20/2/2000
#2:  2       Car      Seat        <NA> 15/2/2000 17/2/2000      <NA>

数据

df <- structure(list(ID = c(1L, 1L, 2L, 2L, 1L), Product = c("Bike", 
"Tire", "Car", "Seat", "Chronometer"), Date = c("1/1/2000", "2/1/2000",
"15/2/2000", "17/2/2000", "20/2/2000")), class = "data.frame",
row.names = c(NA,
-5L))