计算分组数据的平均 rle$lengths
Calculating average rle$lengths over grouped data
我想使用 rle()
对分组数据计算状态持续时间。这是测试数据框:
DF <- read.table(text="Time,x,y,sugar,state,ID
0,31,21,0.2,0,L0
1,31,21,0.65,0,L0
2,31,21,1.0,0,L0
3,31,21,1.5,1,L0
4,31,21,1.91,1,L0
5,31,21,2.3,1,L0
6,31,21,2.75,0,L0
7,31,21,3.14,0,L0
8,31,22,3.0,2,L0
9,31,22,3.47,1,L0
10,31,22,3.930,0,L0
0,37,1,0.2,0,L1
1,37,1,0.65,0,L1
2,37,1,1.089,0,L1
3,37,1,1.5198,0,L1
4,36,1,1.4197,2,L1
5,36,1,1.869,0,L1
6,36,1,2.3096,0,L1
7,36,1,2.738,0,L1
8,36,1,3.16,0,L1
9,36,1,3.5703,0,L1
10,36,1,3.970,0,L1
", header = TRUE, sep =",")
我想知道状态 == 1 的平均长度,按 ID 分组。我创建了一个函数,灵感来自:https://www.reddit.com/r/rstats/comments/brpzo9/tidyverse_groupby_and_rle/
计算 rle 平均部分:
rle_mean_lengths = function(x, value) {
r = rle(x)
cond = r$values == value
data.frame(count = sum(cond), avg_length = mean(r$lengths[cond]))
}
然后我在分组方面添加:
DF %>% group_by(ID) %>% do(rle_mean_lengths(DF$state,1))
但是,生成的值不正确:
ID
count
avg_length
1 L0
2
2
2 L1
2
2
L0 是正确的,L1 没有状态 == 1 的实例,因此平均值应该为零或 NA。
我将问题分解为总结:
DF %>% group_by(ID) %>% summarize_at(vars(state),list(name=mean)) # This works but if I use summarize it gives me weird values again.
如何为 do() 执行等效的 summarize_at()?或者还有其他修复方法吗?谢谢
由于它是 data.frame 列,我们可能需要 unnest
之后
library(dplyr)
library(tidyr)
DF %>%
group_by(ID) %>%
summarise(new = list(rle_mean_lengths(state, 1)), .groups = "drop") %>%
unnest(new)
或删除 list
和 unpack
DF %>%
group_by(ID) %>%
summarise(new = rle_mean_lengths(state, 1), .groups = "drop") %>%
unpack(new)
# A tibble: 2 × 3
ID count avg_length
<chr> <int> <dbl>
1 L0 2 2
2 L1 0 NaN
在 OP 的 do
代码中,应提取的列不应来自整个数据,而应来自来自 lhs 的数据,即 .
(注意 do
有点过时了。所以最好将 summarise
与 unnest/unpack
一起使用
DF %>%
group_by(ID) %>%
do(rle_mean_lengths(.$state,1))
# A tibble: 2 × 3
# Groups: ID [2]
ID count avg_length
<chr> <int> <dbl>
1 L0 2 2
2 L1 0 NaN
我想使用 rle()
对分组数据计算状态持续时间。这是测试数据框:
DF <- read.table(text="Time,x,y,sugar,state,ID
0,31,21,0.2,0,L0
1,31,21,0.65,0,L0
2,31,21,1.0,0,L0
3,31,21,1.5,1,L0
4,31,21,1.91,1,L0
5,31,21,2.3,1,L0
6,31,21,2.75,0,L0
7,31,21,3.14,0,L0
8,31,22,3.0,2,L0
9,31,22,3.47,1,L0
10,31,22,3.930,0,L0
0,37,1,0.2,0,L1
1,37,1,0.65,0,L1
2,37,1,1.089,0,L1
3,37,1,1.5198,0,L1
4,36,1,1.4197,2,L1
5,36,1,1.869,0,L1
6,36,1,2.3096,0,L1
7,36,1,2.738,0,L1
8,36,1,3.16,0,L1
9,36,1,3.5703,0,L1
10,36,1,3.970,0,L1
", header = TRUE, sep =",")
我想知道状态 == 1 的平均长度,按 ID 分组。我创建了一个函数,灵感来自:https://www.reddit.com/r/rstats/comments/brpzo9/tidyverse_groupby_and_rle/ 计算 rle 平均部分:
rle_mean_lengths = function(x, value) {
r = rle(x)
cond = r$values == value
data.frame(count = sum(cond), avg_length = mean(r$lengths[cond]))
}
然后我在分组方面添加:
DF %>% group_by(ID) %>% do(rle_mean_lengths(DF$state,1))
但是,生成的值不正确:
ID | count | avg_length |
---|---|---|
1 L0 | 2 | 2 |
2 L1 | 2 | 2 |
L0 是正确的,L1 没有状态 == 1 的实例,因此平均值应该为零或 NA。 我将问题分解为总结:
DF %>% group_by(ID) %>% summarize_at(vars(state),list(name=mean)) # This works but if I use summarize it gives me weird values again.
如何为 do() 执行等效的 summarize_at()?或者还有其他修复方法吗?谢谢
由于它是 data.frame 列,我们可能需要 unnest
之后
library(dplyr)
library(tidyr)
DF %>%
group_by(ID) %>%
summarise(new = list(rle_mean_lengths(state, 1)), .groups = "drop") %>%
unnest(new)
或删除 list
和 unpack
DF %>%
group_by(ID) %>%
summarise(new = rle_mean_lengths(state, 1), .groups = "drop") %>%
unpack(new)
# A tibble: 2 × 3
ID count avg_length
<chr> <int> <dbl>
1 L0 2 2
2 L1 0 NaN
在 OP 的 do
代码中,应提取的列不应来自整个数据,而应来自来自 lhs 的数据,即 .
(注意 do
有点过时了。所以最好将 summarise
与 unnest/unpack
DF %>%
group_by(ID) %>%
do(rle_mean_lengths(.$state,1))
# A tibble: 2 × 3
# Groups: ID [2]
ID count avg_length
<chr> <int> <dbl>
1 L0 2 2
2 L1 0 NaN