将 data.frame 重组为长格式会将我的数字转换为 NA?

Restructuring data.frame to long format converts my numerics to NA's?

问题

我在 R 中有以下 table,每个薪水显示一行:

> df
  employee employment start_date  end_date salary
1      Ian          1  28Jul2010 28Jul2011  20000
2     Rose          1  28Jul2011 28Jul2012  30000
3     Rose          2  28Jul2012 28Jul2013  31000

我想将其转换为以下结构,每个员工显示一行:

> df2
  employee start_date_employement_1 end_date_employment_1 salary_employement_1 start_date_employement_2 end_date_employment_2 salary_employement_2
1      Ian                28Jul2010             28Jul2011                20000                     <NA>                  <NA>                   NA
2     Rose                28Jul2011             28Jul2012                30000                28Jul2012             28Jul2013                31000

很遗憾,我看不到如何执行此操作,希望得到一些帮助。

注意:创建上述 table 的 R 代码位于此 post 的末尾。


我失败的方法

这是一个数据重组问题,所以我想 reshape/reshape2 包是前进的方向。

我可以 运行 基本的熔化和铸造示例,但看不出如何将其应用于我的具体问题。当我尝试时,我的薪水值消失了,我不确定为什么(似乎是将我的薪水解释为因素而不是数字?):

library(reshape2)
library(dplyr)
> melt(df, id.vars = c("employee", "employment")) %>% 
    arrange(employee, employment)
  employee employment   variable     value
1      Ian          1 start_date 28Jul2010
2      Ian          1   end_date 28Jul2011
3      Ian          1     salary      <NA>
4     Rose          1 start_date 28Jul2011
5     Rose          1   end_date 28Jul2012
6     Rose          1     salary      <NA>
7     Rose          2 start_date 28Jul2012
8     Rose          2   end_date 28Jul2013
9     Rose          2     salary      <NA>
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = c(20000L, 30000L, 31000L)) :
  invalid factor level, NA generated

但是,如果以上方法有效,那么我会这样做:

melt(df, id.vars = c("employee", "employment")) %>% 
  arrange(employee, employment) %>%
  mutate(variable = paste(variable, employment, sep="_")) %>%
  select(employee, variable, value) %>%
  cast()

  employee end_date_1 end_date_2 salary_1 salary_2 start_date_1 start_date_2
1      Ian  28Jul2011       <NA>     <NA>     <NA>    28Jul2010         <NA>
2     Rose  28Jul2012  28Jul2013     <NA>     <NA>    28Jul2011    28Jul2012

这几乎是我想要的,除了 NA 和列的顺序。


数据

df <- 
  structure(list(employee = c("Ian", "Rose", "Rose"), 
               employment = c(1L, 1L, 2L), 
               start_date = c("28Jul2010", "28Jul2011", "28Jul2012"), 
               end_date = c("28Jul2011", "28Jul2012", "28Jul2013"), 
               salary = c(20000.00, 30000.00, 31000.00)), 
          .Names = c("employee", "employment", "start_date", "end_date", "salary"), 
          sorted = c("employee", "employment"), class = c("data.frame"), row.names = c(NA, -3L))


df2 <- 
  structure(list(employee = c("Ian", "Rose"), start_date_employement_1 = c("28Jul2010", "28Jul2011"), 
                 end_date_employment_1 = c("28Jul2011", "28Jul2012"), 
                 salary_employement_1 = c(20000L, 30000L), 
                 start_date_employement_2 = c(NA, "28Jul2012"), 
                 end_date_employment_2 = c(NA, "28Jul2013"), 
                 salary_employement_2 = c(NA, 31000L)), 
            .Names = c("employee", "start_date_employement_1", "end_date_employment_1", "salary_employement_1", "start_date_employement_2", "end_date_employment_2", "salary_employement_2"), 
            class = "data.frame", row.names = c(NA, -2L))

您打算使用 dcast 将数据框从长改成宽; reshape2::dcast 似乎不支持多 value.var 列。您可以使用来自 baseR 的 reshape

reshape(df, direction = "wide", idvar = "employee", timevar = "employment")

#  employee start_date.1 end_date.1 salary.1 start_date.2 end_date.2 salary.2
#1      Ian    28Jul2010  28Jul2011    20000         <NA>       <NA>       NA
#2     Rose    28Jul2011  28Jul2012    30000    28Jul2012  28Jul2013    31000

或使用data.table::dcast:

library(data.table)
dcast(setDT(df), employee ~ employment, value.var = c("start_date", "end_date", "salary"))
#   employee start_date_1 start_date_2 end_date_1 end_date_2 salary_1 salary_2
#1:      Ian    28Jul2010           NA  28Jul2011         NA    20000       NA
#2:     Rose    28Jul2011    28Jul2012  28Jul2012  28Jul2013    30000    31000