多个变量(在列中),多个年份(在列中)重塑为 R 中的平面文件
Multiple Variables (in columns), multiple Years(in columns) to reshape to flatfile in R
我有以下格式的数据,其中包含变量、按年份排列的数据,其中 A、B、C、D 是行 ID。
Variable 1 blank column Variable 2
2008 2009 2010 2011 2008 2009 2010 2011
A 1 5 9 13 5 10 15 20
B 2 6 10 14 25 30 35 40
C 3 7 11 15 45 50 55 60
D 4 8 12 16 65 70 75 80
我想以这种格式获取它:
Variable Year Data
A Variable1 2008 1
A Variable1 2009 5
.....
.....
D Variable2 2010 75
D Variable2 2011 80
我想过使用 library(tidyr) 中的 gather,但我不知道该怎么做。抱歉,没有可重现的示例。
structure(list(X1 = c(NA, "A", "B", "C", "D"), Variable1 = c(2008,
1, 2, 3, 4), X3 = c(2009, 5, 6, 7, 8), X4 = c(2010, 9, 10, 11,
12), X5 = c(2011, 13, 14, 15, 16), Variable1 = c(2008, 5, 25,
45, 65), X7 = c(2009, 10, 30, 50, 70), X8 = c(2010, 15, 35, 55,
75), X9 = c(2011, 20, 40, 60, 80)), .Names = c("X1", "Variable1",
"X3", "X4", "X5", "Variable1", "X7", "X8", "X9"), row.names = c(NA,
5L), class = "data.frame")
library(tidyverse)
names(df) <- c("row_name",
paste(c(t(replicate(4, names(df)[1 + seq(1, length.out=floor(length(names(df))/4), by=4)]))),
df[1,-1],
sep="_"))
df[-1,] %>%
gather(Variable_Year, Data, -row_name) %>%
separate(Variable_Year, into=c("Variable", "Year"), sep="_") %>%
arrange(row_name, Variable, Year)
请注意,您不能将非唯一值作为数据框的“行名称”,因此您可能需要考虑另一种方法来处理 row_name
列下方。
输出为:
row_name Variable Year Data
1 A Variable1 2008 1
2 A Variable1 2009 5
...
31 D Variable2 2010 75
32 D Variable2 2011 80
示例数据:
df -> structure(list(row_name = c(NA, "A", "B", "C", "D"), Variable1_2008 = c(2008,
1, 2, 3, 4), Variable1_2009 = c(2009, 5, 6, 7, 8), Variable1_2010 = c(2010,
9, 10, 11, 12), Variable1_2011 = c(2011, 13, 14, 15, 16), Variable2_2008 = c(2008,
5, 25, 45, 65), Variable2_2009 = c(2009, 10, 30, 50, 70), Variable2_2010 = c(2010,
15, 35, 55, 75), Variable2_2011 = c(2011, 20, 40, 60, 80)), .Names = c("row_name",
"Variable1_2008", "Variable1_2009", "Variable1_2010", "Variable1_2011",
"Variable2_2008", "Variable2_2009", "Variable2_2010", "Variable2_2011"
), row.names = c(NA, 5L), class = "data.frame")
我有以下格式的数据,其中包含变量、按年份排列的数据,其中 A、B、C、D 是行 ID。
Variable 1 blank column Variable 2
2008 2009 2010 2011 2008 2009 2010 2011
A 1 5 9 13 5 10 15 20
B 2 6 10 14 25 30 35 40
C 3 7 11 15 45 50 55 60
D 4 8 12 16 65 70 75 80
我想以这种格式获取它:
Variable Year Data
A Variable1 2008 1
A Variable1 2009 5
.....
.....
D Variable2 2010 75
D Variable2 2011 80
我想过使用 library(tidyr) 中的 gather,但我不知道该怎么做。抱歉,没有可重现的示例。
structure(list(X1 = c(NA, "A", "B", "C", "D"), Variable1 = c(2008,
1, 2, 3, 4), X3 = c(2009, 5, 6, 7, 8), X4 = c(2010, 9, 10, 11,
12), X5 = c(2011, 13, 14, 15, 16), Variable1 = c(2008, 5, 25,
45, 65), X7 = c(2009, 10, 30, 50, 70), X8 = c(2010, 15, 35, 55,
75), X9 = c(2011, 20, 40, 60, 80)), .Names = c("X1", "Variable1",
"X3", "X4", "X5", "Variable1", "X7", "X8", "X9"), row.names = c(NA,
5L), class = "data.frame")
library(tidyverse)
names(df) <- c("row_name",
paste(c(t(replicate(4, names(df)[1 + seq(1, length.out=floor(length(names(df))/4), by=4)]))),
df[1,-1],
sep="_"))
df[-1,] %>%
gather(Variable_Year, Data, -row_name) %>%
separate(Variable_Year, into=c("Variable", "Year"), sep="_") %>%
arrange(row_name, Variable, Year)
请注意,您不能将非唯一值作为数据框的“行名称”,因此您可能需要考虑另一种方法来处理 row_name
列下方。
输出为:
row_name Variable Year Data
1 A Variable1 2008 1
2 A Variable1 2009 5
...
31 D Variable2 2010 75
32 D Variable2 2011 80
示例数据:
df -> structure(list(row_name = c(NA, "A", "B", "C", "D"), Variable1_2008 = c(2008,
1, 2, 3, 4), Variable1_2009 = c(2009, 5, 6, 7, 8), Variable1_2010 = c(2010,
9, 10, 11, 12), Variable1_2011 = c(2011, 13, 14, 15, 16), Variable2_2008 = c(2008,
5, 25, 45, 65), Variable2_2009 = c(2009, 10, 30, 50, 70), Variable2_2010 = c(2010,
15, 35, 55, 75), Variable2_2011 = c(2011, 20, 40, 60, 80)), .Names = c("row_name",
"Variable1_2008", "Variable1_2009", "Variable1_2010", "Variable1_2011",
"Variable2_2008", "Variable2_2009", "Variable2_2010", "Variable2_2011"
), row.names = c(NA, 5L), class = "data.frame")