使用多行将数据从宽变长
reshape data from wide to long with multiple rows
我有一个要重塑的数据集 dfs
dfs
# country.name indicator.name x1990 x1991 x1992
# 507 andorra GDP at market prices (current US$) 1.028989e+09 1.106891e+09 1.209993e+09
# 510 andorra GDP growth (annual %) 3.781393e+00 2.546001e+00 9.292154e-01
# 1347 albania GDP at market prices (current US$) 2.101625e+09 1.139167e+09 7.094526e+08
# 1350 albania GDP growth (annual %) -9.575640e+00 -2.958900e+01 -7.200000e+00
# 3587 austria GDP at market prices (current US$) 1.660624e+11 1.733755e+11 1.946082e+11
我希望指标名称是列,时间与指标在一列中。
# country time gdp_market gdp_growth
# 1 andorra 1990 1028989394 3.7813935
# 2 andorra 1990 1106891025 2.5460006
# 3 andorra 1990 1209992650 0.9292154
# 4 albania 1991 2101624963 3.7813935
# 5 albania 1991 1139166646 2.5460006
# 6 albania 1991 709452584 0.9292154
# 7 austria 1992 166062376740 NA
# 8 austria 1992 173375508073 NA
# 9 austria 1992 194608183696 NA
我可以将数据 melt reshape 成长格式,但不能将它分成两列
library(reshape2)
melt.dfs <- melt(dfs, id=1:2)
我可以进行拆分和绑定,但我更喜欢通过重塑来进行。谢谢
dfs = structure(list(country.name = c("andorra", "andorra", "albania",
"albania", "austria"), indicator.name = c("GDP at market prices (current US$)",
"GDP growth (annual %)", "GDP at market prices (current US$)",
"GDP growth (annual %)", "GDP at market prices (current US$)"
), x1990 = c(1028989393.70295, 3.78139347786568, 2101624962.5,
-9.57564018741695, 166062376739.683), x1991 = c(1106891024.78653,
2.54600064090229, 1139166645.83333, -29.5889976817695, 173375508073.07
), x1992 = c(1209992649.56688, 0.929215382801402, 709452583.880319,
-7.19999998650893, 194608183696.469)), .Names = c("country.name",
"indicator.name", "x1990", "x1991", "x1992"), row.names = c(507L,
510L, 1347L, 1350L, 3587L), class = "data.frame")
我们可以使用
library(dplyr)
library(tidyr)
gather(dfs, time, Val, x1990:x1992) %>%
spread(indicator.name, Val)
编辑:基于@docendo discimus 的评论
或使用recast
library(reshape2)
recast(dfs, measure = 3:5, ...~indicator.name, value.var='value')
我有一个要重塑的数据集 dfs
dfs
# country.name indicator.name x1990 x1991 x1992
# 507 andorra GDP at market prices (current US$) 1.028989e+09 1.106891e+09 1.209993e+09
# 510 andorra GDP growth (annual %) 3.781393e+00 2.546001e+00 9.292154e-01
# 1347 albania GDP at market prices (current US$) 2.101625e+09 1.139167e+09 7.094526e+08
# 1350 albania GDP growth (annual %) -9.575640e+00 -2.958900e+01 -7.200000e+00
# 3587 austria GDP at market prices (current US$) 1.660624e+11 1.733755e+11 1.946082e+11
我希望指标名称是列,时间与指标在一列中。
# country time gdp_market gdp_growth
# 1 andorra 1990 1028989394 3.7813935
# 2 andorra 1990 1106891025 2.5460006
# 3 andorra 1990 1209992650 0.9292154
# 4 albania 1991 2101624963 3.7813935
# 5 albania 1991 1139166646 2.5460006
# 6 albania 1991 709452584 0.9292154
# 7 austria 1992 166062376740 NA
# 8 austria 1992 173375508073 NA
# 9 austria 1992 194608183696 NA
我可以将数据 melt reshape 成长格式,但不能将它分成两列
library(reshape2)
melt.dfs <- melt(dfs, id=1:2)
我可以进行拆分和绑定,但我更喜欢通过重塑来进行。谢谢
dfs = structure(list(country.name = c("andorra", "andorra", "albania",
"albania", "austria"), indicator.name = c("GDP at market prices (current US$)",
"GDP growth (annual %)", "GDP at market prices (current US$)",
"GDP growth (annual %)", "GDP at market prices (current US$)"
), x1990 = c(1028989393.70295, 3.78139347786568, 2101624962.5,
-9.57564018741695, 166062376739.683), x1991 = c(1106891024.78653,
2.54600064090229, 1139166645.83333, -29.5889976817695, 173375508073.07
), x1992 = c(1209992649.56688, 0.929215382801402, 709452583.880319,
-7.19999998650893, 194608183696.469)), .Names = c("country.name",
"indicator.name", "x1990", "x1991", "x1992"), row.names = c(507L,
510L, 1347L, 1350L, 3587L), class = "data.frame")
我们可以使用
library(dplyr)
library(tidyr)
gather(dfs, time, Val, x1990:x1992) %>%
spread(indicator.name, Val)
编辑:基于@docendo discimus 的评论
或使用recast
library(reshape2)
recast(dfs, measure = 3:5, ...~indicator.name, value.var='value')