重塑包含多个不同变量的杂乱纵向调查数据,从宽到长

Reshape messy longitudinal survey data containing multiple different variables, wide to long

希望我不是在造轮子,不要以为下面的问题可以用reshape来回答。

我有杂乱的纵向调查数据,我想将其从宽格式转换为长格式。乱七八糟的意思是:

例如:

data <- read.table(header=T, text='
  id inlove.1 inlove.2 income.2 income.3 mood.1 mood.3 random
  1      TRUE    FALSE 87717.76 82281.25  happy  happy filler
  2      TRUE     TRUE 70795.53 54995.19  so-so  happy filler
  3     FALSE    FALSE 48012.77 47650.47    sad  so-so filler
 ')

我不知道如何使用 reshape 重塑数据,并不断收到错误消息 'times' is wrong length。我认为这是因为并非每次都记录每个变量。此外,我认为 reshape2 中的 meltcast 不会起作用,因为它要求所有测量变量都是同一类型。

我想出了以下可能对其他人有帮助的解决方案。它按时间点选择变量,重命名它们,然后使用 rbind.fillplyr 将它们连接在一起。但我想知道我是否遗漏了 reshape 的某些内容,或者使用 tidyr 或其他软件包是否可以更轻松地完成此操作?

reshapeLong2 <- function(data, varying = NULL, timevar = "time", idvar = "id", sep = ".", patterns = NULL) {

  require(plyr)
  substrRight <- function(x, n){
    substr(x, nchar(x)-n+1, nchar(x))
  }

  if (is.null(varying))
    varying <- names(data)[! names(data) %in% idvar]

  # Create pattern if not specified, guesses by taking numbers given at end of variable names.
  if (is.null(patterns)) {
    times <- unique(na.omit(as.numeric(substrRight(varying, 1))))
    times <- times[order = times]
    patterns <- paste0(sep, times)    
  }

  # Create list of datasets by study time
  ls.df <- lapply(patterns, function(pattern) {
    var.old <- grep(pattern, x = varying, value = TRUE)
    var.new <- gsub(pattern, "", x = var.old)
    df <- data[, c(idvar, var.old)]
    names(df) <- c(idvar, var.new)
    df[, timevar] <- match(pattern, patterns)
    return(df)
  })

  # Concatenate datasets together
  dfs <- rbind.fill(ls.df)
  return(dfs)
}

> reshapeLong2(df.test)
  id inlove  mood time   income
1  1  FALSE   sad    1       NA
2  2   TRUE so-so    1       NA
3  3   TRUE   sad    1       NA
4  1   TRUE  <NA>    2 27766.13
5  2  FALSE  <NA>    2 74395.30
6  3   TRUE  <NA>    2 89004.95
7  1     NA   sad    3 27270.07
8  2     NA so-so    3 36971.64
9  3     NA so-so    3 85986.96
Warning message:
In na.omit(as.numeric(substrRight(varying, 1))) :
  NAs introduced by coercion

注意,警告消息表明有一些变量被删除(在本例中为 "random")。如果所有变量都列为 idvar 或 varying,则不会显示警告。

如果您在 varname.TIME 列中为所有缺失的时间填写 NA,那么您可以 reshape 如:

uniqnames <- c("inlove","income","mood")
allnames  <- make.unique(rep(uniqnames,4))[-(seq_along(uniqnames))]
#[1] "inlove.1" "income.1" "mood.1"   "inlove.2" "income.2" "mood.2" ...
data[setdiff(allnames, names(data)[-1])] <- NA
#  id inlove.1 inlove.2 income.2 income.3 mood.1 mood.3 random income.1 mood.2 inlove.3
#1  1     TRUE    FALSE 87717.76 82281.25  happy  happy filler       NA     NA       NA
#2  2     TRUE     TRUE 70795.53 54995.19  so-so  happy filler       NA     NA       NA
#3  3    FALSE    FALSE 48012.77 47650.47    sad  so-so filler       NA     NA       NA

reshape(data, idvar="id", direction="long", sep=".", varying=allnames)

#    id random time inlove   income  mood
#1.1  1 filler    1   TRUE       NA happy
#2.1  2 filler    1   TRUE       NA so-so
#3.1  3 filler    1  FALSE       NA   sad
#1.2  1 filler    2  FALSE 87717.76  <NA>
#2.2  2 filler    2   TRUE 70795.53  <NA>
#3.2  3 filler    2  FALSE 48012.77  <NA>
#1.3  1 filler    3     NA 82281.25 happy
#2.3  2 filler    3     NA 54995.19 happy
#3.3  3 filler    3     NA 47650.47 so-so