R 将数据框从长格式重塑为宽格式?
R Reshape data frame from long to wide format?
将下面的数据框从长格式转换为宽格式的最佳方法是什么?我尝试使用重塑但没有得到想要的结果。
2015 PROD A test1
2015 PROD A blue
2015 PROD A 50
2015 PROD A 66
2015 PROD A 66
2018 PROD B test2
2018 PROD B yellow
2018 PROD B 70
2018 PROD B 88.8
2018 PROD B 88.8
2018 PROD A test3
2018 PROD A red
2018 PROD A 55
2018 PROD A 88
2018 PROD A 90
一个可能的解决方案是这样
library(tidyverse)
df = read.table(text = "
year prod value
2015 PRODA test1
2015 PRODA blue
2015 PRODA 50
2015 PRODA 66
2015 PRODA 66
2018 PRODB test2
2018 PRODB yellow
2018 PRODB 70
2018 PRODB 88.8
2018 PRODB 88.8
2018 PRODA test3
2018 PRODA red
2018 PRODA 55
2018 PRODA 88
2018 PRODA 90
", header=T, stringsAsFactors=F)
df %>%
group_by(year, prod) %>% # for each year and prod combination
mutate(id = paste0("new_col_", row_number())) %>% # enumerate rows (this will be used as column names in the reshaped version)
ungroup() %>% # forget the grouping
spread(id, value) # reshape
# # A tibble: 3 x 7
# year prod new_col_1 new_col_2 new_col_3 new_col_4 new_col_5
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 2015 PRODA test1 blue 50 66 66
# 2 2018 PRODA test3 red 55 88 90
# 3 2018 PRODB test2 yellow 70 88.8 88.8
为了完整起见,这里有一个使用 data.table
方便的 rowid()
函数的解决方案。
这个问题的关键点是重塑完全取决于行位置 of value
in each (year
, product
) 团体。 rowid(year, product)
对每组中的行进行编号。所以,重塑本质上变成了 one-liner:
library(data.table)
dcast(setDT(df1), year + product ~ rowid(year, product, prefix = "col_"))
year product col_1 col_2 col_3 col_4 col_5
1: 2015 PROD A test1 blue 50 66 66
2: 2018 PROD A test3 red 55 88 90
3: 2018 PROD B test2 yellow 70 88.8 88.8
请注意,rowid()
采用 prefix
参数以确保生成的列名在语法上是正确的。
警告: 此解决方案假定 year
和 product
为每个组形成一个 唯一键 。
数据
数据按 OP 发布的方式读取,未对数据进行任何修改。但是,这需要几行 post-processing:
library(data.table)
df1 <- fread("
2015 PROD A test1
2015 PROD A blue
2015 PROD A 50
2015 PROD A 66
2015 PROD A 66
2018 PROD B test2
2018 PROD B yellow
2018 PROD B 70
2018 PROD B 88.8
2018 PROD B 88.8
2018 PROD A test3
2018 PROD A red
2018 PROD A 55
2018 PROD A 88
2018 PROD A 90",
header = FALSE, col.names = c("year", "product", "value"), drop = 2L)[
, product := paste("PROD", product)][]
将下面的数据框从长格式转换为宽格式的最佳方法是什么?我尝试使用重塑但没有得到想要的结果。
2015 PROD A test1
2015 PROD A blue
2015 PROD A 50
2015 PROD A 66
2015 PROD A 66
2018 PROD B test2
2018 PROD B yellow
2018 PROD B 70
2018 PROD B 88.8
2018 PROD B 88.8
2018 PROD A test3
2018 PROD A red
2018 PROD A 55
2018 PROD A 88
2018 PROD A 90
一个可能的解决方案是这样
library(tidyverse)
df = read.table(text = "
year prod value
2015 PRODA test1
2015 PRODA blue
2015 PRODA 50
2015 PRODA 66
2015 PRODA 66
2018 PRODB test2
2018 PRODB yellow
2018 PRODB 70
2018 PRODB 88.8
2018 PRODB 88.8
2018 PRODA test3
2018 PRODA red
2018 PRODA 55
2018 PRODA 88
2018 PRODA 90
", header=T, stringsAsFactors=F)
df %>%
group_by(year, prod) %>% # for each year and prod combination
mutate(id = paste0("new_col_", row_number())) %>% # enumerate rows (this will be used as column names in the reshaped version)
ungroup() %>% # forget the grouping
spread(id, value) # reshape
# # A tibble: 3 x 7
# year prod new_col_1 new_col_2 new_col_3 new_col_4 new_col_5
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 2015 PRODA test1 blue 50 66 66
# 2 2018 PRODA test3 red 55 88 90
# 3 2018 PRODB test2 yellow 70 88.8 88.8
为了完整起见,这里有一个使用 data.table
方便的 rowid()
函数的解决方案。
这个问题的关键点是重塑完全取决于行位置 of value
in each (year
, product
) 团体。 rowid(year, product)
对每组中的行进行编号。所以,重塑本质上变成了 one-liner:
library(data.table)
dcast(setDT(df1), year + product ~ rowid(year, product, prefix = "col_"))
year product col_1 col_2 col_3 col_4 col_5 1: 2015 PROD A test1 blue 50 66 66 2: 2018 PROD A test3 red 55 88 90 3: 2018 PROD B test2 yellow 70 88.8 88.8
请注意,rowid()
采用 prefix
参数以确保生成的列名在语法上是正确的。
警告: 此解决方案假定 year
和 product
为每个组形成一个 唯一键 。
数据
数据按 OP 发布的方式读取,未对数据进行任何修改。但是,这需要几行 post-processing:
library(data.table)
df1 <- fread("
2015 PROD A test1
2015 PROD A blue
2015 PROD A 50
2015 PROD A 66
2015 PROD A 66
2018 PROD B test2
2018 PROD B yellow
2018 PROD B 70
2018 PROD B 88.8
2018 PROD B 88.8
2018 PROD A test3
2018 PROD A red
2018 PROD A 55
2018 PROD A 88
2018 PROD A 90",
header = FALSE, col.names = c("year", "product", "value"), drop = 2L)[
, product := paste("PROD", product)][]