R 将数据框从长格式重塑为宽格式?

R Reshape data frame from long to wide format?

将下面的数据框从长格式转换为宽格式的最佳方法是什么?我尝试使用重塑但没有得到想要的结果。

2015    PROD A  test1
2015    PROD A  blue
2015    PROD A  50
2015    PROD A  66
2015    PROD A  66
2018    PROD B  test2
2018    PROD B  yellow
2018    PROD B  70
2018    PROD B  88.8
2018    PROD B  88.8
2018    PROD A  test3
2018    PROD A  red
2018    PROD A  55
2018    PROD A  88
2018    PROD A  90

一个可能的解决方案是这样

library(tidyverse)

df = read.table(text = "
                year prod value
                2015    PRODA  test1
                2015    PRODA  blue
                2015    PRODA  50
                2015    PRODA  66
                2015    PRODA  66
                2018    PRODB  test2
                2018    PRODB  yellow
                2018    PRODB  70
                2018    PRODB  88.8
                2018    PRODB  88.8
                2018    PRODA  test3
                2018    PRODA  red
                2018    PRODA  55
                2018    PRODA  88
                2018    PRODA  90
                ", header=T, stringsAsFactors=F)

df %>%
  group_by(year, prod) %>%                           # for each year and prod combination
  mutate(id = paste0("new_col_", row_number())) %>%  # enumerate rows (this will be used as column names in the reshaped version)
  ungroup() %>%                                      # forget the grouping
  spread(id, value)                                  # reshape

# # A tibble: 3 x 7
#    year prod  new_col_1 new_col_2 new_col_3 new_col_4 new_col_5
#   <int> <chr> <chr>     <chr>     <chr>     <chr>     <chr>    
# 1  2015 PRODA test1     blue      50        66        66       
# 2  2018 PRODA test3     red       55        88        90       
# 3  2018 PRODB test2     yellow    70        88.8      88.8 

为了完整起见,这里有一个使用 data.table 方便的 rowid() 函数的解决方案。

这个问题的关键点是重塑完全取决于行位置 of value in each (year, product) 团体。 rowid(year, product) 对每组中的行进行编号。所以,重塑本质上变成了 one-liner:

library(data.table)
dcast(setDT(df1), year + product ~ rowid(year, product, prefix = "col_"))
   year product col_1  col_2 col_3 col_4 col_5
1: 2015  PROD A test1   blue    50    66    66
2: 2018  PROD A test3    red    55    88    90
3: 2018  PROD B test2 yellow    70  88.8  88.8

请注意,rowid() 采用 prefix 参数以确保生成的列名在语法上是正确的。

警告: 此解决方案假定 yearproduct 为每个组形成一个 唯一键

数据

数据按 OP 发布的方式读取,未对数据进行任何修改。但是,这需要几行 post-processing:

library(data.table)    
df1 <- fread("
2015    PROD A  test1
2015    PROD A  blue
2015    PROD A  50
2015    PROD A  66
2015    PROD A  66
2018    PROD B  test2
2018    PROD B  yellow
2018    PROD B  70
2018    PROD B  88.8
2018    PROD B  88.8
2018    PROD A  test3
2018    PROD A  red
2018    PROD A  55
2018    PROD A  88
2018    PROD A  90", 
      header = FALSE, col.names = c("year", "product", "value"), drop = 2L)[
        , product := paste("PROD", product)][]