在 dplyr 中使用 approx

Question

我正在尝试使用点 x 对 year 之间的数据框中的每个 id 进行线性近似。 dplyr 似乎是一个合适的选择，但由于错误我无法让它工作：

Error: incompatible size (9), expecting 3 (the group size) or 1

示例代码：

library(dplyr)
dat <- data.frame(id = c(1,1,1,2,2,2,3,3,3), year = c(1,2,3,1,2,3,1,2,3), x = c(1,NA,2, 3, NA, 4, 5, NA, 6))

# Linear Interpolation
dat %>% 
  group_by(id) %>% 
  mutate(x2 = as.numeric(unlist(approx(x = dat$year, y = dat$x, xout = dat$x)[2])))

示例数据：

  id year  x
1  1    1  1
2  1    2 NA
3  1    3  2
4  2    1  3
5  2    2 NA
6  2    3  4
7  3    1  5
8  3    2 NA
9  3    3  6

Answer 1

你可以在 base R 中这样做：

dat <- dat[order(dat$id, dat$year),]
dat$x2 <- unlist(by(dat, dat$id, function(df) approx(df$year, df$x, xout = df$year)[2]))
dat
  id year  x  x2
1  1    1  1 1.0
2  1    2 NA 1.5
3  1    3  2 2.0
4  2    1  3 3.0
5  2    2 NA 3.5
6  2    3  4 4.0
7  3    1  5 5.0
8  3    2 NA 5.5
9  3    3  6 6.0

Answer 2

这里有几个方法（转自评论）：

1) na.approx/ave

library(zoo)

transform(dat, x2 = ave(x, id, FUN = na.approx))

年份为 1、2、3，我们不需要指定它，但如果需要，则：

nr <- nrow(dat)
transform(dat, x2 = ave(1:nr, id, FUN = function(i) with(dat[i, ], na.approx(x, year))))

2) na.approx/dplyr

library(dplyr)
library(zoo)

dat %>% 
    group_by(id) %>% 
        mutate(x2 = na.approx(x, year)) %>% 
    ungroup()

如果不需要年份则省略 na.approx 的第二个参数。

注： zoo还有其他NA填充函数，特别是na.spline和na.locf.

在 dplyr 中使用 approx

Using approx in dplyr

r

approximation

dplyr