在 R 中重组面板数据 - 年作为列
Restructuring Panel Data in R - Years as Columns
简而言之:
我有什么:
ID Name wage2010 wage2011 wage2012 kids2010 kids2011 kids2012
1 Chris 20,000 18,000 21,000 2 2 2
2 Pat 40,000 45,000 45,000 1 1 2
3 Susan 10,000 18,000 15,000 0 1 1
我想要的:
ID Name year wage kids
1 Chris 2010 20,000 2
1 Chris 2011 18,000 2
1 Chris 2012 21,000 2
2 Pat 2010 40,000 1
2 Pat 2011 45,000 1
2 Pat 2012 45,000 2
3 Susan 2010 10,000 0
3 Susan 2011 18,000 1
3 Susan 2012 15,000 1
有什么帮助建议吗?在此先感谢。
我们可以使用 data.table
中的 melt
library(data.table)
melt(setDT(df1), measure = patterns('^wage', '^kids'),
variable.name = 'year', value.name = c('wage', 'kids'))
或 pivot_longer
来自 tidyr
library(tidyr)
pivot_longer(df1, cols = -c(ID, Name),
names_to = c( '.value', 'year'), names_sep = "(?<=[a-z])(?=[0-9])")
# A tibble: 9 x 5
# ID Name year wage kids
# <int> <chr> <chr> <chr> <int>
#1 1 Chris 2010 20,000 2
#2 1 Chris 2011 18,000 2
#3 1 Chris 2012 21,000 2
#4 2 Pat 2010 40,000 1
#5 2 Pat 2011 45,000 1
#6 2 Pat 2012 45,000 2
#7 3 Susan 2010 10,000 0
#8 3 Susan 2011 18,000 1
#9 3 Susan 2012 15,000 1
数据
df1 <- structure(list(ID = 1:3, Name = c("Chris", "Pat", "Susan"),
wage2010 = c("20,000",
"40,000", "10,000"), wage2011 = c("18,000", "45,000", "18,000"
), wage2012 = c("21,000", "45,000", "15,000"), kids2010 = 2:0,
kids2011 = c(2L, 1L, 1L), kids2012 = c(2L, 2L, 1L)),
class = "data.frame", row.names = c(NA,
-3L))
简而言之:
我有什么:
ID Name wage2010 wage2011 wage2012 kids2010 kids2011 kids2012
1 Chris 20,000 18,000 21,000 2 2 2
2 Pat 40,000 45,000 45,000 1 1 2
3 Susan 10,000 18,000 15,000 0 1 1
我想要的:
ID Name year wage kids
1 Chris 2010 20,000 2
1 Chris 2011 18,000 2
1 Chris 2012 21,000 2
2 Pat 2010 40,000 1
2 Pat 2011 45,000 1
2 Pat 2012 45,000 2
3 Susan 2010 10,000 0
3 Susan 2011 18,000 1
3 Susan 2012 15,000 1
有什么帮助建议吗?在此先感谢。
我们可以使用 data.table
melt
library(data.table)
melt(setDT(df1), measure = patterns('^wage', '^kids'),
variable.name = 'year', value.name = c('wage', 'kids'))
或 pivot_longer
来自 tidyr
library(tidyr)
pivot_longer(df1, cols = -c(ID, Name),
names_to = c( '.value', 'year'), names_sep = "(?<=[a-z])(?=[0-9])")
# A tibble: 9 x 5
# ID Name year wage kids
# <int> <chr> <chr> <chr> <int>
#1 1 Chris 2010 20,000 2
#2 1 Chris 2011 18,000 2
#3 1 Chris 2012 21,000 2
#4 2 Pat 2010 40,000 1
#5 2 Pat 2011 45,000 1
#6 2 Pat 2012 45,000 2
#7 3 Susan 2010 10,000 0
#8 3 Susan 2011 18,000 1
#9 3 Susan 2012 15,000 1
数据
df1 <- structure(list(ID = 1:3, Name = c("Chris", "Pat", "Susan"),
wage2010 = c("20,000",
"40,000", "10,000"), wage2011 = c("18,000", "45,000", "18,000"
), wage2012 = c("21,000", "45,000", "15,000"), kids2010 = 2:0,
kids2011 = c(2L, 1L, 1L), kids2012 = c(2L, 2L, 1L)),
class = "data.frame", row.names = c(NA,
-3L))