使用 base R 更有效地将宽数据重新格式化为长数据
Using base R to reformat wide data to long data more efficiently
关于 reshape() 的其他问题帮助我重新格式化了我的数据(见下文),但我想知道是否有办法以更简单的方式(更少的代码)实现我所做的事情 using base R,没有包。
我有这样的示例数据:
period store_id product price1 price2 price3 quantity1 quantity2 quantity3
201801 1 1 11 5 6 100 200 300
201802 1 1 12 6 6 100 200 300
201803 1 1 13 7 6 100 200 300
201804 1 1 14 8 6 100 200 300
201805 1 1 15 9 6 100 200 300
201806 2 2 16 10 6 100 200 300
201807 2 2 17 11 6 100 200 300
201808 2 2 18 12 6 100 200 300
201809 2 2 19 13 6 100 200 300
201810 2 2 20 14 6 100 200 300
我希望它看起来像这样(每个周期有 3 行,一列用于价格,一列用于数量):
period store_id product quantity type price
201801 1 1 100 price1 11
201801 1 1 200 price2 5
201801 1 1 300 price3 6
201802 1 1 100 price1 12
201802 1 1 200 price2 6
201802 1 1 300 price3 6
201803 1 1 100 price1 13
201803 1 1 200 price2 7
201803 1 1 300 price3 6
201804 1 1 100 price1 14
and so on...
我能够重新格式化我的数据,方法是首先将数据拆分为两个数据框,一个用于价格,一个用于数量,然后将它们重新合并在一起。有没有比我在下面做的更少 steps/code 更简单的方法来为我的示例使用 reshape()?
感谢任何帮助,谢谢。
CODE TO PRODUCE INITIAL DATA SET
period <- 201801:201810
df <- data.frame(period)
df$store_id <- c(1,1,1,1,1,2,2,2,2,2)
df$product <- c(1,1,1,1,1,2,2,2,2,2)
df$price1 <- 11:20
df$price2 <- 5:14
df$price3 <- df_test$price1-df_test$price2
df$quantity1 <- 100
df$quantity2 <- 200
df$quantity3 <- 300
df
MY INEFFICIENT CODE / SOLUTION
# prices
df_prices <- df[,!names(df)%in%c("quantity1","quantity2","quantity3")]
df_prices <- reshape(df_prices,
direction = "long",
idvar = c("period", "product", "store_id"),
timevar = "type",
varying = c("price1","price2","price3"),
v.names = "price",
times = c("price1","price2","price3"))
# quantities
df_quantities <- df[,!names(df)%in%c("price1","price2","price3")]
df_quantities <- reshape(df_quantities,
direction = "long",
idvar = c("period", "product", "store_id"),
timevar = "type",
varying = c("quantity1","quantity2","quantity3"),
v.names = "quantity",
times = c("quantity1","quantity2","quantity3"))
# make variables consistent to merge
df_quantities$type[df_quantities$type=="quantity1"] <- "price1"
df_quantities$type[df_quantities$type=="quantity2"] <- "price2"
df_quantities$type[df_quantities$type=="quantity3"] <- "price3"
# merge both
df <- merge(df_prices, df_quantities, by=c("period","product","store_id","type"))
您可以将变化指定为 4:9
,即第 4 列到第 9 列。并指定 sep=''
,即 time
和 1
之间没有任何内容,您有 time1
.如果你有 time_1
那么 sep = '_'
:
reshape(df, 4:9, dir = 'long', sep="")
period store_id product time price quantity id
1.1 201801 1 1 1 11 100 1
2.1 201802 1 1 1 12 100 2
3.1 201803 1 1 1 13 100 3
4.1 201804 1 1 1 14 100 4
5.1 201805 1 1 1 15 100 5
6.1 201806 2 2 1 16 100 6
7.1 201807 2 2 1 17 100 7
8.1 201808 2 2 1 18 100 8
9.1 201809 2 2 1 19 100 9
10.1 201810 2 2 1 20 100 10
1.2 201801 1 1 2 5 200 1
2.2 201802 1 1 2 6 200 2
3.2 201803 1 1 2 7 200 3
4.2 201804 1 1 2 8 200 4
5.2 201805 1 1 2 9 200 5
6.2 201806 2 2 2 10 200 6
7.2 201807 2 2 2 11 200 7
8.2 201808 2 2 2 12 200 8
9.2 201809 2 2 2 13 200 9
10.2 201810 2 2 2 14 200 10
1.3 201801 1 1 3 6 300 1
2.3 201802 1 1 3 6 300 2
3.3 201803 1 1 3 6 300 3
4.3 201804 1 1 3 6 300 4
5.3 201805 1 1 3 6 300 5
6.3 201806 2 2 3 6 300 6
7.3 201807 2 2 3 6 300 7
8.3 201808 2 2 3 6 300 8
9.3 201809 2 2 3 6 300 9
10.3 201810 2 2 3 6 300 10
关于 reshape() 的其他问题帮助我重新格式化了我的数据(见下文),但我想知道是否有办法以更简单的方式(更少的代码)实现我所做的事情 using base R,没有包。 我有这样的示例数据:
period store_id product price1 price2 price3 quantity1 quantity2 quantity3
201801 1 1 11 5 6 100 200 300
201802 1 1 12 6 6 100 200 300
201803 1 1 13 7 6 100 200 300
201804 1 1 14 8 6 100 200 300
201805 1 1 15 9 6 100 200 300
201806 2 2 16 10 6 100 200 300
201807 2 2 17 11 6 100 200 300
201808 2 2 18 12 6 100 200 300
201809 2 2 19 13 6 100 200 300
201810 2 2 20 14 6 100 200 300
我希望它看起来像这样(每个周期有 3 行,一列用于价格,一列用于数量):
period store_id product quantity type price
201801 1 1 100 price1 11
201801 1 1 200 price2 5
201801 1 1 300 price3 6
201802 1 1 100 price1 12
201802 1 1 200 price2 6
201802 1 1 300 price3 6
201803 1 1 100 price1 13
201803 1 1 200 price2 7
201803 1 1 300 price3 6
201804 1 1 100 price1 14
and so on...
我能够重新格式化我的数据,方法是首先将数据拆分为两个数据框,一个用于价格,一个用于数量,然后将它们重新合并在一起。有没有比我在下面做的更少 steps/code 更简单的方法来为我的示例使用 reshape()?
感谢任何帮助,谢谢。
CODE TO PRODUCE INITIAL DATA SET
period <- 201801:201810
df <- data.frame(period)
df$store_id <- c(1,1,1,1,1,2,2,2,2,2)
df$product <- c(1,1,1,1,1,2,2,2,2,2)
df$price1 <- 11:20
df$price2 <- 5:14
df$price3 <- df_test$price1-df_test$price2
df$quantity1 <- 100
df$quantity2 <- 200
df$quantity3 <- 300
df
MY INEFFICIENT CODE / SOLUTION
# prices
df_prices <- df[,!names(df)%in%c("quantity1","quantity2","quantity3")]
df_prices <- reshape(df_prices,
direction = "long",
idvar = c("period", "product", "store_id"),
timevar = "type",
varying = c("price1","price2","price3"),
v.names = "price",
times = c("price1","price2","price3"))
# quantities
df_quantities <- df[,!names(df)%in%c("price1","price2","price3")]
df_quantities <- reshape(df_quantities,
direction = "long",
idvar = c("period", "product", "store_id"),
timevar = "type",
varying = c("quantity1","quantity2","quantity3"),
v.names = "quantity",
times = c("quantity1","quantity2","quantity3"))
# make variables consistent to merge
df_quantities$type[df_quantities$type=="quantity1"] <- "price1"
df_quantities$type[df_quantities$type=="quantity2"] <- "price2"
df_quantities$type[df_quantities$type=="quantity3"] <- "price3"
# merge both
df <- merge(df_prices, df_quantities, by=c("period","product","store_id","type"))
您可以将变化指定为 4:9
,即第 4 列到第 9 列。并指定 sep=''
,即 time
和 1
之间没有任何内容,您有 time1
.如果你有 time_1
那么 sep = '_'
:
reshape(df, 4:9, dir = 'long', sep="")
period store_id product time price quantity id
1.1 201801 1 1 1 11 100 1
2.1 201802 1 1 1 12 100 2
3.1 201803 1 1 1 13 100 3
4.1 201804 1 1 1 14 100 4
5.1 201805 1 1 1 15 100 5
6.1 201806 2 2 1 16 100 6
7.1 201807 2 2 1 17 100 7
8.1 201808 2 2 1 18 100 8
9.1 201809 2 2 1 19 100 9
10.1 201810 2 2 1 20 100 10
1.2 201801 1 1 2 5 200 1
2.2 201802 1 1 2 6 200 2
3.2 201803 1 1 2 7 200 3
4.2 201804 1 1 2 8 200 4
5.2 201805 1 1 2 9 200 5
6.2 201806 2 2 2 10 200 6
7.2 201807 2 2 2 11 200 7
8.2 201808 2 2 2 12 200 8
9.2 201809 2 2 2 13 200 9
10.2 201810 2 2 2 14 200 10
1.3 201801 1 1 3 6 300 1
2.3 201802 1 1 3 6 300 2
3.3 201803 1 1 3 6 300 3
4.3 201804 1 1 3 6 300 4
5.3 201805 1 1 3 6 300 5
6.3 201806 2 2 3 6 300 6
7.3 201807 2 2 3 6 300 7
8.3 201808 2 2 3 6 300 8
9.3 201809 2 2 3 6 300 9
10.3 201810 2 2 3 6 300 10