将 data.frame 重组为长格式会将我的数字转换为 NA?
Restructuring data.frame to long format converts my numerics to NA's?
问题
我在 R 中有以下 table,每个薪水显示一行:
> df
employee employment start_date end_date salary
1 Ian 1 28Jul2010 28Jul2011 20000
2 Rose 1 28Jul2011 28Jul2012 30000
3 Rose 2 28Jul2012 28Jul2013 31000
我想将其转换为以下结构,每个员工显示一行:
> df2
employee start_date_employement_1 end_date_employment_1 salary_employement_1 start_date_employement_2 end_date_employment_2 salary_employement_2
1 Ian 28Jul2010 28Jul2011 20000 <NA> <NA> NA
2 Rose 28Jul2011 28Jul2012 30000 28Jul2012 28Jul2013 31000
很遗憾,我看不到如何执行此操作,希望得到一些帮助。
注意:创建上述 table 的 R 代码位于此 post 的末尾。
我失败的方法
这是一个数据重组问题,所以我想 reshape/reshape2 包是前进的方向。
我可以 运行 基本的熔化和铸造示例,但看不出如何将其应用于我的具体问题。当我尝试时,我的薪水值消失了,我不确定为什么(似乎是将我的薪水解释为因素而不是数字?):
library(reshape2)
library(dplyr)
> melt(df, id.vars = c("employee", "employment")) %>%
arrange(employee, employment)
employee employment variable value
1 Ian 1 start_date 28Jul2010
2 Ian 1 end_date 28Jul2011
3 Ian 1 salary <NA>
4 Rose 1 start_date 28Jul2011
5 Rose 1 end_date 28Jul2012
6 Rose 1 salary <NA>
7 Rose 2 start_date 28Jul2012
8 Rose 2 end_date 28Jul2013
9 Rose 2 salary <NA>
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = c(20000L, 30000L, 31000L)) :
invalid factor level, NA generated
但是,如果以上方法有效,那么我会这样做:
melt(df, id.vars = c("employee", "employment")) %>%
arrange(employee, employment) %>%
mutate(variable = paste(variable, employment, sep="_")) %>%
select(employee, variable, value) %>%
cast()
employee end_date_1 end_date_2 salary_1 salary_2 start_date_1 start_date_2
1 Ian 28Jul2011 <NA> <NA> <NA> 28Jul2010 <NA>
2 Rose 28Jul2012 28Jul2013 <NA> <NA> 28Jul2011 28Jul2012
这几乎是我想要的,除了 NA 和列的顺序。
数据
df <-
structure(list(employee = c("Ian", "Rose", "Rose"),
employment = c(1L, 1L, 2L),
start_date = c("28Jul2010", "28Jul2011", "28Jul2012"),
end_date = c("28Jul2011", "28Jul2012", "28Jul2013"),
salary = c(20000.00, 30000.00, 31000.00)),
.Names = c("employee", "employment", "start_date", "end_date", "salary"),
sorted = c("employee", "employment"), class = c("data.frame"), row.names = c(NA, -3L))
df2 <-
structure(list(employee = c("Ian", "Rose"), start_date_employement_1 = c("28Jul2010", "28Jul2011"),
end_date_employment_1 = c("28Jul2011", "28Jul2012"),
salary_employement_1 = c(20000L, 30000L),
start_date_employement_2 = c(NA, "28Jul2012"),
end_date_employment_2 = c(NA, "28Jul2013"),
salary_employement_2 = c(NA, 31000L)),
.Names = c("employee", "start_date_employement_1", "end_date_employment_1", "salary_employement_1", "start_date_employement_2", "end_date_employment_2", "salary_employement_2"),
class = "data.frame", row.names = c(NA, -2L))
您打算使用 dcast
将数据框从长改成宽; reshape2::dcast
似乎不支持多 value.var 列。您可以使用来自 baseR 的 reshape
:
reshape(df, direction = "wide", idvar = "employee", timevar = "employment")
# employee start_date.1 end_date.1 salary.1 start_date.2 end_date.2 salary.2
#1 Ian 28Jul2010 28Jul2011 20000 <NA> <NA> NA
#2 Rose 28Jul2011 28Jul2012 30000 28Jul2012 28Jul2013 31000
或使用data.table::dcast
:
library(data.table)
dcast(setDT(df), employee ~ employment, value.var = c("start_date", "end_date", "salary"))
# employee start_date_1 start_date_2 end_date_1 end_date_2 salary_1 salary_2
#1: Ian 28Jul2010 NA 28Jul2011 NA 20000 NA
#2: Rose 28Jul2011 28Jul2012 28Jul2012 28Jul2013 30000 31000
问题
我在 R 中有以下 table,每个薪水显示一行:
> df
employee employment start_date end_date salary
1 Ian 1 28Jul2010 28Jul2011 20000
2 Rose 1 28Jul2011 28Jul2012 30000
3 Rose 2 28Jul2012 28Jul2013 31000
我想将其转换为以下结构,每个员工显示一行:
> df2
employee start_date_employement_1 end_date_employment_1 salary_employement_1 start_date_employement_2 end_date_employment_2 salary_employement_2
1 Ian 28Jul2010 28Jul2011 20000 <NA> <NA> NA
2 Rose 28Jul2011 28Jul2012 30000 28Jul2012 28Jul2013 31000
很遗憾,我看不到如何执行此操作,希望得到一些帮助。
注意:创建上述 table 的 R 代码位于此 post 的末尾。
我失败的方法
这是一个数据重组问题,所以我想 reshape/reshape2 包是前进的方向。
我可以 运行 基本的熔化和铸造示例,但看不出如何将其应用于我的具体问题。当我尝试时,我的薪水值消失了,我不确定为什么(似乎是将我的薪水解释为因素而不是数字?):
library(reshape2)
library(dplyr)
> melt(df, id.vars = c("employee", "employment")) %>%
arrange(employee, employment)
employee employment variable value
1 Ian 1 start_date 28Jul2010
2 Ian 1 end_date 28Jul2011
3 Ian 1 salary <NA>
4 Rose 1 start_date 28Jul2011
5 Rose 1 end_date 28Jul2012
6 Rose 1 salary <NA>
7 Rose 2 start_date 28Jul2012
8 Rose 2 end_date 28Jul2013
9 Rose 2 salary <NA>
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = c(20000L, 30000L, 31000L)) :
invalid factor level, NA generated
但是,如果以上方法有效,那么我会这样做:
melt(df, id.vars = c("employee", "employment")) %>%
arrange(employee, employment) %>%
mutate(variable = paste(variable, employment, sep="_")) %>%
select(employee, variable, value) %>%
cast()
employee end_date_1 end_date_2 salary_1 salary_2 start_date_1 start_date_2
1 Ian 28Jul2011 <NA> <NA> <NA> 28Jul2010 <NA>
2 Rose 28Jul2012 28Jul2013 <NA> <NA> 28Jul2011 28Jul2012
这几乎是我想要的,除了 NA 和列的顺序。
数据
df <-
structure(list(employee = c("Ian", "Rose", "Rose"),
employment = c(1L, 1L, 2L),
start_date = c("28Jul2010", "28Jul2011", "28Jul2012"),
end_date = c("28Jul2011", "28Jul2012", "28Jul2013"),
salary = c(20000.00, 30000.00, 31000.00)),
.Names = c("employee", "employment", "start_date", "end_date", "salary"),
sorted = c("employee", "employment"), class = c("data.frame"), row.names = c(NA, -3L))
df2 <-
structure(list(employee = c("Ian", "Rose"), start_date_employement_1 = c("28Jul2010", "28Jul2011"),
end_date_employment_1 = c("28Jul2011", "28Jul2012"),
salary_employement_1 = c(20000L, 30000L),
start_date_employement_2 = c(NA, "28Jul2012"),
end_date_employment_2 = c(NA, "28Jul2013"),
salary_employement_2 = c(NA, 31000L)),
.Names = c("employee", "start_date_employement_1", "end_date_employment_1", "salary_employement_1", "start_date_employement_2", "end_date_employment_2", "salary_employement_2"),
class = "data.frame", row.names = c(NA, -2L))
您打算使用 dcast
将数据框从长改成宽; reshape2::dcast
似乎不支持多 value.var 列。您可以使用来自 baseR 的 reshape
:
reshape(df, direction = "wide", idvar = "employee", timevar = "employment")
# employee start_date.1 end_date.1 salary.1 start_date.2 end_date.2 salary.2
#1 Ian 28Jul2010 28Jul2011 20000 <NA> <NA> NA
#2 Rose 28Jul2011 28Jul2012 30000 28Jul2012 28Jul2013 31000
或使用data.table::dcast
:
library(data.table)
dcast(setDT(df), employee ~ employment, value.var = c("start_date", "end_date", "salary"))
# employee start_date_1 start_date_2 end_date_1 end_date_2 salary_1 salary_2
#1: Ian 28Jul2010 NA 28Jul2011 NA 20000 NA
#2: Rose 28Jul2011 28Jul2012 28Jul2012 28Jul2013 30000 31000