在 R 中重塑大型数据集
Reshaping large dataset in R
我正在尝试重塑大型数据集,但无法按我想要的正确顺序获得结果。
数据如下:
GeoFIPS GeoName IndustryID Description X2001 X2002 X2003 X2004 X2005
10180 Abilene, TX 21 Mining 96002 92407 127138 150449 202926
10180 Abilene, TX 22 Utilities 33588 34116 33105 33265 32452
...
数据框很长,包括美国所有 MSA 和选定的行业。
我希望它看起来像这样:
GeoFIPS GeoName Year Mining Utilities (etc)
10180 Abilene, TX 2001 96002 33588
10180 Abilene, TX 2002 92407 34116
....
我是 R 的新手,非常感谢您的帮助。
我检查了从宽到长和从长到宽,但这似乎是一个更复杂的情况。
谢谢!
编辑:
数据
df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX",
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining",
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 =
c(202926L,
32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description",
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
row.names = c(NA, -2L))
您可以使用 reshape2
中的 melt/dcast
library(reshape2)
df2 <- melt(df1, id.var=c('GeoFIPS', 'GeoName',
'IndustryID', 'Description'))
df2 <- transform(df2, Year=sub('^X', '', variable))[-c(3,5)]
dcast(df2, ...~Description, value.var='value')
# GeoFIPS GeoName Year Mining Utilities
#1 10180 Abilene, TX 2001 96002 33588
#2 10180 Abilene, TX 2002 92407 34116
#3 10180 Abilene, TX 2003 127138 33105
#4 10180 Abilene, TX 2004 150449 33265
#5 10180 Abilene, TX 2005 202926 32452
数据
df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX",
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining",
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 =
c(202926L,
32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description",
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
row.names = c(NA, -2L))
我正在尝试重塑大型数据集,但无法按我想要的正确顺序获得结果。
数据如下:
GeoFIPS GeoName IndustryID Description X2001 X2002 X2003 X2004 X2005
10180 Abilene, TX 21 Mining 96002 92407 127138 150449 202926
10180 Abilene, TX 22 Utilities 33588 34116 33105 33265 32452
...
数据框很长,包括美国所有 MSA 和选定的行业。
我希望它看起来像这样:
GeoFIPS GeoName Year Mining Utilities (etc)
10180 Abilene, TX 2001 96002 33588
10180 Abilene, TX 2002 92407 34116
....
我是 R 的新手,非常感谢您的帮助。 我检查了从宽到长和从长到宽,但这似乎是一个更复杂的情况。 谢谢!
编辑: 数据
df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX",
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining",
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 =
c(202926L,
32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description",
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
row.names = c(NA, -2L))
您可以使用 reshape2
melt/dcast
library(reshape2)
df2 <- melt(df1, id.var=c('GeoFIPS', 'GeoName',
'IndustryID', 'Description'))
df2 <- transform(df2, Year=sub('^X', '', variable))[-c(3,5)]
dcast(df2, ...~Description, value.var='value')
# GeoFIPS GeoName Year Mining Utilities
#1 10180 Abilene, TX 2001 96002 33588
#2 10180 Abilene, TX 2002 92407 34116
#3 10180 Abilene, TX 2003 127138 33105
#4 10180 Abilene, TX 2004 150449 33265
#5 10180 Abilene, TX 2005 202926 32452
数据
df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX",
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining",
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 =
c(202926L,
32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description",
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
row.names = c(NA, -2L))