如何使用 dplyr lag() 平滑变量中的微小变化
How to use dplyr lag() to smooth minor changes in a variable
我已经对数据进行了分组,并且我想对每个组进行平滑处理。如果绝对变化很小(例如小于 5),我认为它们是测量误差,因此想复制(前滚)旧值。在每个组中,我将第一个测量值初始化为默认值。因此,我假设每组的第一个观察结果总是正确的(有争议)。
set.seed(5)
mydata = data.frame(group=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2),
year=seq(from=2003, to=2009, by=1),
variable = round(runif(14, min = -5, max = 15),0))
mydata %>%
filter(variable > 0) %>%
group_by(group) %>%
mutate(smooth5 = ifelse( abs( lag(variable, n = 1, default = first(variable)) - variable ) <= 5 , variable, 5)) %>%
select(group, year, variable, smooth5) %>%
arrange(group)
# A tibble: 10 x 4
# Groups: group [2]
group year variable smooth5
<dbl> <dbl> <dbl> <dbl>
1 1 2004 9 9
2 1 2005 13 13 # <- this change is |4|, thus it should use the old value 9
3 1 2006 1 5 # <- here 13 changes to 1 is a reasonable change, should keep 1
4 1 2008 9 5
5 1 2009 6 6
6 2 2003 11 11
7 2 2004 14 14
8 2 2007 5 5
9 2 2008 1 1
10 2 2009 6 6
你很接近,但是你的 ifelse()
调用有一些错误。下面,为了清楚起见,我添加了一个新变量 previous
。如果abs(previous - variable) <= 5
,你想要previous
,否则你想要variable
:
mydata %>%
filter(variable > 0) %>%
group_by(group) %>%
mutate(previous = lag(variable, n = 1, default = first(variable)),
smooth5 = ifelse(abs(previous - variable) <= 5, previous, variable)) %>%
select(group, year, variable, smooth5) %>%
arrange(group)
这给出了
# A tibble: 10 x 4
# Groups: group [2]
group year variable smooth5
<dbl> <dbl> <dbl> <dbl>
1 1 2004 9 9
2 1 2005 13 9
3 1 2006 1 1
4 1 2008 9 9
5 1 2009 6 9
6 2 2003 11 11
7 2 2004 14 11
8 2 2007 5 5
9 2 2008 1 5
10 2 2009 6 1
我已经对数据进行了分组,并且我想对每个组进行平滑处理。如果绝对变化很小(例如小于 5),我认为它们是测量误差,因此想复制(前滚)旧值。在每个组中,我将第一个测量值初始化为默认值。因此,我假设每组的第一个观察结果总是正确的(有争议)。
set.seed(5)
mydata = data.frame(group=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2),
year=seq(from=2003, to=2009, by=1),
variable = round(runif(14, min = -5, max = 15),0))
mydata %>%
filter(variable > 0) %>%
group_by(group) %>%
mutate(smooth5 = ifelse( abs( lag(variable, n = 1, default = first(variable)) - variable ) <= 5 , variable, 5)) %>%
select(group, year, variable, smooth5) %>%
arrange(group)
# A tibble: 10 x 4
# Groups: group [2]
group year variable smooth5
<dbl> <dbl> <dbl> <dbl>
1 1 2004 9 9
2 1 2005 13 13 # <- this change is |4|, thus it should use the old value 9
3 1 2006 1 5 # <- here 13 changes to 1 is a reasonable change, should keep 1
4 1 2008 9 5
5 1 2009 6 6
6 2 2003 11 11
7 2 2004 14 14
8 2 2007 5 5
9 2 2008 1 1
10 2 2009 6 6
你很接近,但是你的 ifelse()
调用有一些错误。下面,为了清楚起见,我添加了一个新变量 previous
。如果abs(previous - variable) <= 5
,你想要previous
,否则你想要variable
:
mydata %>%
filter(variable > 0) %>%
group_by(group) %>%
mutate(previous = lag(variable, n = 1, default = first(variable)),
smooth5 = ifelse(abs(previous - variable) <= 5, previous, variable)) %>%
select(group, year, variable, smooth5) %>%
arrange(group)
这给出了
# A tibble: 10 x 4
# Groups: group [2]
group year variable smooth5
<dbl> <dbl> <dbl> <dbl>
1 1 2004 9 9
2 1 2005 13 9
3 1 2006 1 1
4 1 2008 9 9
5 1 2009 6 9
6 2 2003 11 11
7 2 2004 14 11
8 2 2007 5 5
9 2 2008 1 5
10 2 2009 6 1