如何使用 dplyr lag() 平滑变量中的微小变化

How to use dplyr lag() to smooth minor changes in a variable

我已经对数据进行了分组,并且我想对每个组进行平滑处理。如果绝对变化很小(例如小于 5),我认为它们是测量误差,因此想复制(前滚)旧值。在每个组中,我将第一个测量值初始化为默认值。因此,我假设每组的第一个观察结果总是正确的(有争议)。

set.seed(5)
mydata = data.frame(group=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                       year=seq(from=2003, to=2009, by=1), 
                       variable = round(runif(14, min = -5, max = 15),0))
mydata %>%
  filter(variable > 0) %>%
  group_by(group) %>%
  mutate(smooth5 = ifelse( abs( lag(variable, n = 1, default = first(variable)) - variable ) <= 5 , variable, 5)) %>%       
  select(group, year, variable, smooth5) %>%
  arrange(group)

# A tibble: 10 x 4
# Groups:   group [2]
   group  year variable smooth5
   <dbl> <dbl>    <dbl>   <dbl>
 1     1  2004        9       9
 2     1  2005       13      13  # <- this change is |4|, thus it should use the old value 9
 3     1  2006        1       5  # <- here 13 changes to 1 is a reasonable change, should keep 1
 4     1  2008        9       5
 5     1  2009        6       6
 6     2  2003       11      11
 7     2  2004       14      14
 8     2  2007        5       5
 9     2  2008        1       1
10     2  2009        6       6

你很接近,但是你的 ifelse() 调用有一些错误。下面,为了清楚起见,我添加了一个新变量 previous。如果abs(previous - variable) <= 5,你想要previous,否则你想要variable

mydata %>%
  filter(variable > 0) %>%
  group_by(group) %>%
  mutate(previous = lag(variable, n = 1, default = first(variable)),
         smooth5 = ifelse(abs(previous - variable) <= 5, previous, variable)) %>%       
  select(group, year, variable, smooth5) %>%
  arrange(group)

这给出了

# A tibble: 10 x 4
# Groups:   group [2]
   group  year variable smooth5
   <dbl> <dbl>    <dbl>   <dbl>
 1     1  2004        9       9
 2     1  2005       13       9
 3     1  2006        1       1
 4     1  2008        9       9
 5     1  2009        6       9
 6     2  2003       11      11
 7     2  2004       14      11
 8     2  2007        5       5
 9     2  2008        1       5
10     2  2009        6       1