在 R 中从 'cut()' 输出一个数值

Question

我在这里读到了这个问题： Group numeric values by the intervals

但是，我想输出一个数字（而不是一个因子），特别是下限 and/or 上限的数值（在单独的列中）

本质上，这是正确的，除了 'df$start' 和 'df$end' 作为因子给出：

df$start <- cut(df$x, 
                breaks = c(0,25,75,125,175,225,299),
                labels = c(0,25,75,125,175,225),
                right = TRUE)

df$end <- cut(df$x, 
              breaks = c(0,25,75,125,175,225,299),
              labels = c(25,75,125,175,225,299),
              right = TRUE)

使用 'as.numeric()' returns 水平因子（即值 1-6）而不是原始数字。

谢谢！

Answer 1

我猜你想要什么，因为如果你想要 "original numbers"，你可以使用 df$x。我想你是在找一些数字来反映这个群体？在那个猜测中，下面的呢。

## Generate some example data
x = runif(5, 0, 300)
## Specify the labels
labels = c(0,25,75,125,175,225)
## Use cut as before
y = cut(x, 
    breaks = c(0,25,75,125,175,225,300),
    labels = labels,
    right = TRUE)

当我们将y转换为数字时，这给出了标签的索引。因此，

labels[as.numeric(y)]

或更简单

labels[y]

Answer 2

cut 的大部分行为都与创建您不感兴趣的标签有关。您最好使用 findInterval 或 .bincode。

您将从数据开始

set.seed(17)
df <- data.frame(x=300 * runif(100))

然后设置休息时间并找到间隔：

breaks <- c(0,25,75,125,175,225,299)
df$interval <- findInterval(df$x, breaks)
df$start <- breaks[df$interval]
df$end <- breaks[df$interval + 1]

Answer 3

我会使用正则表达式，因为所有信息都在 cut.

的输出中

cut_borders <- function(x){
pattern <- "(\(|\[)(-*[0-9]+\.*[0-9]*),(-*[0-9]+\.*[0-9]*)(\)|\])"

start <- as.numeric(gsub(pattern,"\2", x))
end <- as.numeric(gsub(pattern,"\3", x))

data.frame(start, end)
}

文字中的规律：

第 1 组：( 或 [，因此我们使用 (\(|\[).
第2组：数字可能是负数，所以我们（-*），我们正在寻找至少一个数字（[0-9]+）可以有小数位，即一个点 (\.*) 和点后的小数 ([0-9]*).
下一个逗号(,)
第 3 组：与第 2 组相同。
第 4 组：类似于第 1 组，我们期待 ) 或 ]。

这是一些用分位数切割的随机变量。我们要找的函数cut_bordersreturns：

x <- rnorm(10)

x_groups <- cut(x, quantile(x, 0:4/4), include.lowest= TRUE)

cut_borders(x_groups)

Answer 4

我们可以利用tidyr::extract

library(tidyverse)
set.seed(17)
df <- data.frame(x = cut(300 * runif(100), c(0,25,75,125,175,225,299)))

df %>%
  extract(x, c("start", "end"), "(-?\d+),(-?\d+)")
#>     start end
#> 1      25  75
#> 2     225 299
#> 3     125 175
#> 4     225 299
#> 5      75 125
#> 6     125 175
#> ...

^{由 reprex package (v2.0.0)}

于 2021-05-11 创建

P.S。感谢 and 提供正则表达式的初稿，此处进行了修改。两者 +1 :)

在 R 中从 'cut()' 输出一个数值

Output a numeric value from 'cut()' in R

cut

r