为连续值创建组名
Create group names for consecutive values
看起来很简单,想不出更简单的方法。我在下面有一个 x
向量,需要为连续值创建组名。我的尝试是使用 rle
,更好的主意?
# data
x <- c(1,1,1,2,2,2,3,2,2,1,1)
# make groups
rep(paste0("Group_", 1:length(rle(x)$lengths)), rle(x)$lengths)
# [1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4"
# [9] "Group_4" "Group_5" "Group_5"
使用 data.table
、
中的 rleid
library(data.table)
rleid(x, prefix = "Group_")
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
使用 diff
和 cumsum
:
paste0("Group_", cumsum(c(1, diff(x) != 0)))
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
(如果您的值是浮点值,您可能必须避免 !=
并改用容差。)
使用 cumsum 但不依赖于数字数据:
paste0("Group_", 1 + c(0, cumsum(x[-length(x)] != x[-1])))
[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
groupdata2 中的 group() 可以使用 l_starts
方法从组起点列表创建组。通过设置n
为auto
,它会自动找到组开始:
x <- c(1,1,1,2,2,2,3,2,2,1,1)
groupdata2::group(x, n = "auto", method = "l_starts")
## # A tibble: 11 x 2
## # Groups: .groups [5]
## data .groups
## <dbl> <fct>
## 1 1 1
## 2 1 1
## 3 1 1
## 4 2 2
## 5 2 2
## 6 2 2
## 7 3 3
## 8 2 4
## 9 2 4
## 10 1 5
## 11 1 5
还有 differs_from_previous()
函数可以查找与先前值相差某个阈值的值或值的索引。
# The values to start groups at
differs_from_previous(x, threshold = 1,
direction = "both")
## [1] 2 3 2 1
# The indices to start groups at
differs_from_previous(x, threshold = 1,
direction = "both",
return_index = TRUE)
## [1] 4 7 8 10
看起来很简单,想不出更简单的方法。我在下面有一个 x
向量,需要为连续值创建组名。我的尝试是使用 rle
,更好的主意?
# data
x <- c(1,1,1,2,2,2,3,2,2,1,1)
# make groups
rep(paste0("Group_", 1:length(rle(x)$lengths)), rle(x)$lengths)
# [1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4"
# [9] "Group_4" "Group_5" "Group_5"
使用 data.table
、
rleid
library(data.table)
rleid(x, prefix = "Group_")
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
使用 diff
和 cumsum
:
paste0("Group_", cumsum(c(1, diff(x) != 0)))
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
(如果您的值是浮点值,您可能必须避免 !=
并改用容差。)
使用 cumsum 但不依赖于数字数据:
paste0("Group_", 1 + c(0, cumsum(x[-length(x)] != x[-1])))
[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
group() 可以使用 l_starts
方法从组起点列表创建组。通过设置n
为auto
,它会自动找到组开始:
x <- c(1,1,1,2,2,2,3,2,2,1,1)
groupdata2::group(x, n = "auto", method = "l_starts")
## # A tibble: 11 x 2
## # Groups: .groups [5]
## data .groups
## <dbl> <fct>
## 1 1 1
## 2 1 1
## 3 1 1
## 4 2 2
## 5 2 2
## 6 2 2
## 7 3 3
## 8 2 4
## 9 2 4
## 10 1 5
## 11 1 5
还有 differs_from_previous()
函数可以查找与先前值相差某个阈值的值或值的索引。
# The values to start groups at
differs_from_previous(x, threshold = 1,
direction = "both")
## [1] 2 3 2 1
# The indices to start groups at
differs_from_previous(x, threshold = 1,
direction = "both",
return_index = TRUE)
## [1] 4 7 8 10