R：使用某些列的新值集重复数据框中的行

Question

employee <- c("John", "Adi", "Sam")
salary <- c(21000, 22000, 23000)
startdate <- as.Date(c("2014-11-01","2014-01-01","2014-10-01"))
enddate <- as.Date(c("2015-10-31","2014-12-31","2015-10-31"))
N<- c(2,1,2)
df<- data.frame(employee,salary, startdate, enddate, N)

我想将整行重复 "n" 次，其中 "n" 在 N 列中指定，但我想更改 enddate 在原始行中作为固定日期，例如“31/12/2014”，并在重复行中将此固定日期设置为 startdate。运行在 df2:

中查看结果示例（预期输出）的代码

employee <- c(rep("John",2), "Adi", rep("Sam",2))
salary <- c(21000,21000, 22000, 23000,23000)
startdate <- as.Date(c("2014-11-01","2014-12-31", "2014-01-01","2014-10-01","2014-12-31"))
enddate <- as.Date(c("2014-12-31","2015-10-31","2014-12-31","2014-12-31","2015-10-31"))
N<- c(2,2,1,2,2)
df2<- data.frame(employee,salary, startdate, enddate, N)

Answer 1

我们可以使用 data.table 来做到这一点。我们将 'data.frame' 转换为 'data.table' (setDT(df))，通过复制 'N' 变量来扩展行。我们得到按 'employee' 分组的观察 (.I[1L]) 的数字索引 ('i1')，用它来分配 (:=) 'enddate' 和 '2014 -12-31'。同样，我们为每个 'employee' 获取倒数第二个元素 (.I[seq_len(.N)>1L]) 的行索引 ('i2')，并将 'startdate' 设置为“2014-12-31”。

DT <- setDT(df)[rep(seq_len(.N), N)]
i1 <- DT[,  .I[1L] , by = employee]$V1
DT[i1, enddate:= as.Date('2014-12-31')]
i2 <- DT[, .I[seq_len(.N)>1L], employee]$V1
DT[i2, startdate:= as.Date('2014-12-31')]
identical(as.data.table(df2), DT)
#[1] TRUE

或者我们可以使用 if 并连接 'startdate' 和 'enddate' 的“2014-12-31”，按 'employee' 分组，然后赋值输出返回到 'startdate'、'enddate'.

列

DT[, c('startdate', 'enddate') := if(.N>1L) 
            list(c(startdate[1L], as.Date('2014-12-31')),
                 c(as.Date('2014-12-31'), enddate[.N])) , by = employee]
identical(DT, as.data.table(df2))
#[1] TRUE

R：使用某些列的新值集重复数据框中的行

R : Repeat rows in a data frame with new set of values for certain columns

r

dataframe

rep