按组计算避免 NA 的百分比变化
Calculate % Change Avoiding NA's, By Group
我有以下示例数据:
df <- structure(list(person = c("a", "a", "a", "a", "a", "a", "a",
"b", "b", "b", "b", "c", "c", "c", "c", "c", "c", "c", "d", "d",
"d", "e", "f"), data = c(4.51646448148296, 6.50396548317118,
NA, 9.84425670640439, NA, NA, 1.67850554309375, NA, 2.06950894895984,
4.69548965710482, 5.12795074939009, 0.209845957108408, 6.47290211779118,
6.26481009615426, NA, NA, 5.24543093106204, 4.75242871394851,
6.8182202925863, NA, 5.3162641705813, 6.05862341995543, NA)), row.names = c(NA,
-23L), class = "data.frame")
看起来像这样
person data
1 a 4.516464
2 a 6.503965
3 a NA
4 a 9.844257
5 a NA
6 a NA
7 a 1.678506
8 b NA
9 b 2.069509
10 b 4.695490
11 b 5.127951
12 c 0.209846
13 c 6.472902
14 c 6.264810
15 c NA
16 c NA
17 c 5.245431
18 c 4.752429
19 d 6.818220
20 d NA
21 d 5.316264
22 e 6.058623
23 f NA
我正在尝试能够根据 person
计算 data
列中最新和第二个最新条目的百分比变化。条目按人从第一到最后记录(即,第 1 行是 A 的第一个条目,第 7 行是他们的最新条目)。
我尝试了以下代码的各种组合,但没有成功:
df %>%
group_by(person) %>%
mutate(percChange = c(rep(NA, ((df %>% count(person))[2]) -
length(((na.omit(data) - lag(na.omit(data)))/na.omit(data)*100))),
(na.omit(data) - lag(na.omit(data)))/na.omit(data)*100)) %>%
summarise(max = max(na.omit(data)),
avg = mean(na.omit(data)),
lastPercChange = last(na.omit(percChange)))
所需的输出看起来像这样,以计算最后的百分比变化为例:
person max avg lastPercChange
1 a 9.84 5.64 33.93138 (1.678506-9.844257)/1.678506 * 100
2 b 5.13 3.96 8.433407 (5.127951-4.695490)/5.127951 * 100
3 c 6.47 4.59 -10.37368 (4.752429-5.245431)/4.752429 * 100
4 d 6.82 6.07 14.66658 (5.316264-6.818220)/5.316264 * 100
5 e 6.06 6.06 NA (since only 1 observation)
6 f -Inf NaN NA (since only 1 observation)
如有任何帮助,我们将不胜感激。
为简单起见,我会先删除所有 NA
行,然后很容易计算出您想要的值。函数 nth(..)
或 head
/tail
可以帮助我们提供最后一个和倒数第二个值。
如果 person
具有所有 NA
值,它将从输出中完全删除以保持其完整,我们对原始数据执行 right_join
。
library(dplyr)
df %>%
filter(!is.na(data)) %>%
group_by(person) %>%
summarise(max = max(data),
avg = mean(data),
lastPercChange = (nth(data,-1) - nth(data,-2))/nth(data, -1) *100)%>%
right_join(df %>% distinct(person), by = 'person')
# person max avg lastPercChange
# <chr> <dbl> <dbl> <dbl>
#1 a 9.84 5.64 -486.
#2 b 5.13 3.96 8.43
#3 c 6.47 4.59 -10.4
#4 d 6.82 6.07 -28.3
#5 e 6.06 6.06 NA
#6 f NA NA NA
我有以下示例数据:
df <- structure(list(person = c("a", "a", "a", "a", "a", "a", "a",
"b", "b", "b", "b", "c", "c", "c", "c", "c", "c", "c", "d", "d",
"d", "e", "f"), data = c(4.51646448148296, 6.50396548317118,
NA, 9.84425670640439, NA, NA, 1.67850554309375, NA, 2.06950894895984,
4.69548965710482, 5.12795074939009, 0.209845957108408, 6.47290211779118,
6.26481009615426, NA, NA, 5.24543093106204, 4.75242871394851,
6.8182202925863, NA, 5.3162641705813, 6.05862341995543, NA)), row.names = c(NA,
-23L), class = "data.frame")
看起来像这样
person data
1 a 4.516464
2 a 6.503965
3 a NA
4 a 9.844257
5 a NA
6 a NA
7 a 1.678506
8 b NA
9 b 2.069509
10 b 4.695490
11 b 5.127951
12 c 0.209846
13 c 6.472902
14 c 6.264810
15 c NA
16 c NA
17 c 5.245431
18 c 4.752429
19 d 6.818220
20 d NA
21 d 5.316264
22 e 6.058623
23 f NA
我正在尝试能够根据 person
计算 data
列中最新和第二个最新条目的百分比变化。条目按人从第一到最后记录(即,第 1 行是 A 的第一个条目,第 7 行是他们的最新条目)。
我尝试了以下代码的各种组合,但没有成功:
df %>%
group_by(person) %>%
mutate(percChange = c(rep(NA, ((df %>% count(person))[2]) -
length(((na.omit(data) - lag(na.omit(data)))/na.omit(data)*100))),
(na.omit(data) - lag(na.omit(data)))/na.omit(data)*100)) %>%
summarise(max = max(na.omit(data)),
avg = mean(na.omit(data)),
lastPercChange = last(na.omit(percChange)))
所需的输出看起来像这样,以计算最后的百分比变化为例:
person max avg lastPercChange
1 a 9.84 5.64 33.93138 (1.678506-9.844257)/1.678506 * 100
2 b 5.13 3.96 8.433407 (5.127951-4.695490)/5.127951 * 100
3 c 6.47 4.59 -10.37368 (4.752429-5.245431)/4.752429 * 100
4 d 6.82 6.07 14.66658 (5.316264-6.818220)/5.316264 * 100
5 e 6.06 6.06 NA (since only 1 observation)
6 f -Inf NaN NA (since only 1 observation)
如有任何帮助,我们将不胜感激。
为简单起见,我会先删除所有 NA
行,然后很容易计算出您想要的值。函数 nth(..)
或 head
/tail
可以帮助我们提供最后一个和倒数第二个值。
如果 person
具有所有 NA
值,它将从输出中完全删除以保持其完整,我们对原始数据执行 right_join
。
library(dplyr)
df %>%
filter(!is.na(data)) %>%
group_by(person) %>%
summarise(max = max(data),
avg = mean(data),
lastPercChange = (nth(data,-1) - nth(data,-2))/nth(data, -1) *100)%>%
right_join(df %>% distinct(person), by = 'person')
# person max avg lastPercChange
# <chr> <dbl> <dbl> <dbl>
#1 a 9.84 5.64 -486.
#2 b 5.13 3.96 8.43
#3 c 6.47 4.59 -10.4
#4 d 6.82 6.07 -28.3
#5 e 6.06 6.06 NA
#6 f NA NA NA