R 根据时间间隔按线性增加估算 NA

R Impute NA's by Linear Increase Depending on Time Interval

问题

我需要在我的数据框中归因于来自重复测量研究的 NA。在这个特定的结果上,我需要用最近观察到的非 NA 值 +1 来估算 NA,从最后观测值。

示例

包含目标插补目标的示例数据框。

df <- data.frame(
  subject = rep(1:3, each = 12),
  week = rep(c(8, 10, 12, 16, 20, 26, 32, 44, 52, 64, 78, 104),3),
  value = c(112, 97, 130, 104, NA, NA, NA, NA, NA, NA, NA, NA,
            89, 86, 94, 96, 88,107, 110, 102, 107, NA, NA, NA,
            107, 110, 102, 130, 104, 88, 82, 79, 92, 106, NA, NA),
  goal = c(112, 97, 130, 104, 104, 104, 104, 104, 104, 104, 105, 105,
            89, 86, 94, 96, 88,107, 110, 102, 107, 107,107, 108,
            107, 110, 102, 130, 104, 88, 82, 79, 92, 106, 106, 106)
)

我把中间的列留在里面是为了让发生的事情更明显,但你可以用简单的 select.

删除它们
df = df %>%
  group_by(subject) %>%
  mutate(last_obs_week = max(week[!is.na(value)]),
         since_last_week = pmax(0, week - last_obs_week),
         inc_52 = since_last_week %/% 52,
         result = zoo::na.locf(value) + inc_52
  ) 

all(df$goal == df$result)
# [1] TRUE

print.data.frame(df)
#    subject week value goal last_obs_week since_last_week inc_52 result
# 1        1    8   112  112            16               0      0    112
# 2        1   10    97   97            16               0      0     97
# 3        1   12   130  130            16               0      0    130
# 4        1   16   104  104            16               0      0    104
# 5        1   20    NA  104            16               4      0    104
# 6        1   26    NA  104            16              10      0    104
# 7        1   32    NA  104            16              16      0    104
# 8        1   44    NA  104            16              28      0    104
# 9        1   52    NA  104            16              36      0    104
# 10       1   64    NA  104            16              48      0    104
# 11       1   78    NA  105            16              62      1    105
# 12       1  104    NA  105            16              88      1    105
# 13       2    8    89   89            52               0      0     89
# ...

可以使用 dplyrtidyr::fill 来获得所需的结果。逻辑是添加一列来跟踪具有 non-NA 值的 week。使用 tidyr::fill 填充上一个 non-NA 值,然后检查当前周与上一个 non-NA 周的差异是否大于 52,然后将值增加 1

library(dplyr)
library(tidyr)

df %>% group_by(subject) %>%
  mutate(weekWithLastNonNaValue = ifelse(is.na(value), NA, week)) %>%
  fill(value, weekWithLastNonNaValue) %>%
  mutate(value = value + (week-weekWithLastNonNaValue) %/% 52) %>%
  select(-weekWithLastNonNaValue) %>%
  as.data.frame()

# subject week value goal
# 1        1    8   112  112
# 2        1   10    97   97
# 3        1   12   130  130
# 4        1   16   104  104
# 5        1   20   104  104
# 6        1   26   104  104
# 7        1   32   104  104
# 8        1   44   104  104
# 9        1   52   104  104
# 10       1   64   104  104
# 11       1   78   105  105
# 12       1  104   105  105
# 13       2    8    89   89
# 14       2   10    86   86
# 15       2   12    94   94
# 16       2   16    96   96
# 17       2   20    88   88
# 18       2   26   107  107
# 19       2   32   110  110
# 20       2   44   102  102
#
# so on
#