获取高数据集中分组值之间的差异
Get difference between grouped values in tall dataset
我有一个如下例所示的数据集:
Name df Value
A 1 .5
A 2 2
A 3 3
B 1 1
B 2 .5
我想得到值之间的差异,直到名称列发生变化,然后我希望它停止并开始获取新的差异。如下所示:
Name df Value Diff
A 1 .5 NA
A 2 2 1.5
A 3 3 2.5
B 1 1 NA
B 2 .5 -.5
有什么办法可以做到这一点吗?我试过将数据集制作成宽格式,但我也想不出办法让它发挥作用。
一个选项是按 diff
分组
library(dplyr)
df1 %>%
group_by(Name) %>%
mutate(Diff = c(NA, cumsum(diff(Value))))
# A tibble: 5 x 4
# Groups: Name [2]
# Name df Value Diff
# <chr> <int> <dbl> <dbl>
#1 A 1 0.5 NA
#2 A 2 2 1.5
#3 A 3 3 2.5
#4 B 1 1 NA
#5 B 2 0.5 -0.5
数据
df1 <- structure(list(Name = c("A", "A", "A", "B", "B"), df = c(1L,
2L, 3L, 1L, 2L), Value = c(0.5, 2, 3, 1, 0.5)),
class = "data.frame", row.names = c(NA,
-5L))
@akrun 答案是要走的路,但就像一个谜语一样,这也有效:
df1 %>%
group_by(Name) %>%
mutate(Diff = cumsum(Value - lag(Value, default = Value[1])))
# # A tibble: 5 x 4
# # Groups: Name [2]
# Name df Value Diff
# <chr> <int> <dbl> <dbl>
# 1 A 1 0.5 0
# 2 A 2 2 1.5
# 3 A 3 3 2.5
# 4 B 1 1 0
# 5 B 2 0.5 -0.5
我有一个如下例所示的数据集:
Name df Value
A 1 .5
A 2 2
A 3 3
B 1 1
B 2 .5
我想得到值之间的差异,直到名称列发生变化,然后我希望它停止并开始获取新的差异。如下所示:
Name df Value Diff
A 1 .5 NA
A 2 2 1.5
A 3 3 2.5
B 1 1 NA
B 2 .5 -.5
有什么办法可以做到这一点吗?我试过将数据集制作成宽格式,但我也想不出办法让它发挥作用。
一个选项是按 diff
library(dplyr)
df1 %>%
group_by(Name) %>%
mutate(Diff = c(NA, cumsum(diff(Value))))
# A tibble: 5 x 4
# Groups: Name [2]
# Name df Value Diff
# <chr> <int> <dbl> <dbl>
#1 A 1 0.5 NA
#2 A 2 2 1.5
#3 A 3 3 2.5
#4 B 1 1 NA
#5 B 2 0.5 -0.5
数据
df1 <- structure(list(Name = c("A", "A", "A", "B", "B"), df = c(1L,
2L, 3L, 1L, 2L), Value = c(0.5, 2, 3, 1, 0.5)),
class = "data.frame", row.names = c(NA,
-5L))
@akrun 答案是要走的路,但就像一个谜语一样,这也有效:
df1 %>%
group_by(Name) %>%
mutate(Diff = cumsum(Value - lag(Value, default = Value[1])))
# # A tibble: 5 x 4
# # Groups: Name [2]
# Name df Value Diff
# <chr> <int> <dbl> <dbl>
# 1 A 1 0.5 0
# 2 A 2 2 1.5
# 3 A 3 3 2.5
# 4 B 1 1 0
# 5 B 2 0.5 -0.5