for 循环以确定间隔中前 10% 的值

for loop to determine the top 10 percent of values in an interval

我在 data.frame 中基本上有两列(向量),速度和加速度是这样的:

    speed     acceleration
1   3.2694444 2.6539535522
2   3.3388889 2.5096979141
3   3.3888889 2.2722134590
4   3.4388889 1.9815256596
5   3.5000000 1.6777544022
6   3.5555556 1.3933215141
7   3.6055556 1.1439051628
8   3.6527778 0.9334115982
9   3.6722222 0.7561602592

我需要为 x 轴(速度)上的每个值速度找到 y 轴(加速度)前 10% 的最大值是多少。这也需要在特定的时间间隔内。例如速度 3.2-3.4、3.4-3.6 等。你能告诉我在这种情况下 for 循环会是什么样子吗?

正如@alistaire 已经指出的,您提供的数据非常有限。所以我们首先必须模拟更多的数据,我们可以根据这些数据测试我们的代码。

set.seed(1)

# your data
speed <- c(3.2694444, 3.3388889, 3.3388889, 3.4388889, 3.5,
           3.5555556, 3.6055556, 3.6527778, 3.6722222)
acceleration <- c(2.6539535522, 2.5096979141, 2.2722134590,
                  1.9815256596, 1.6777544022, 1.3933215141,
                  1.1439051628, 0.9334115982, 0.7561602592)
df <- data.frame(speed, acceleration)

# expand data.frame and add a little bit of noise to all values
# to make them 'unique'
df <- as.data.frame(do.call(
  rbind,
  replicate(15L, apply(df, 2, \(x) (x + runif(length(x), -1e-1, 1e-1) )),
            simplify = FALSE)
))

函数create_intervals,顾名思义,创建用户定义的间隔。其余代码执行 'heavy lifting' 并将所需结果存储在 out.

如果您想要 speed 的间隔具有相等的宽度,只需指定您想要的组数 (n_groups) 并保留其余参数(即 lwruprinterval_span) 未指定。

# Cut speed into user-defined intervals
create_intervals <- \(n_groups = NULL, lwr = NULL, upr = NULL, interval_span = NULL) {
  if (!is.null(lwr) & !is.null(upr) & !is.null(interval_span) & is.null(n_groups)) {
    speed_low <- subset(df, speed < lwr, select = speed)
    first_interval <- with(speed_low, c(min(speed), lwr))
    middle_intervals <- seq(lwr + interval_span, upr - interval_span, interval_span)
    speed_upp <- subset(df, speed > upr, select = speed)
    last_interval <- with(speed_upp, c(upr, max(speed)))
    intervals <- c(first_interval, middle_intervals, last_interval)
  } else {
    step <- with(df, c(max(speed) - min(speed))/n_groups)
    intervals <- array(0L, dim = n_groups)
    for(i in seq_len(n_groups)) {
      intervals[i] <- min(df$speed) + i * step
    }
  }
  return(intervals)
}

# three intervals with equal width
my_intervals <- create_intervals(n_groups = 3L)

# Compute values of speed when acceleration is greater then
# or equal to the 90th percentile 
out <- lapply(1:(length(my_intervals)-1L), \(i) {
  x <- subset(df, speed >= my_intervals[i] & speed <= my_intervals[i+1L])
  x[x$acceleration >= quantile(x$acceleration, 0.9), ]
})

# function to round values to two decimal places
r <- \(x) format(round(x, 2), nsmall = 2L)

# assign names to each element of out
for(i in seq_along(out)) {
  names(out)[i] <- paste0(r(my_intervals[i]), '-', r(my_intervals[i+1L]))
}

输出 1

> out
$`3.38-3.57`
       speed acceleration
11  3.394378     2.583636
21  3.383631     2.267659
57  3.434123     2.300234
83  3.394886     2.580924
101 3.395459     2.460971

$`3.57-3.76`
      speed acceleration
6  3.635234     1.447290
41 3.572868     1.618293
51 3.615017     1.420020
95 3.575412     1.763215

我们还可以根据 'sense' 比等距速度间隔更多的间隔来计算 speed 的期望值,例如[min(speed), 3.3), [3.3, 3.45), [3.45, 3.6), 和 [3.6, max(speed)).

这可以通过不指定 n_groups 而指定 lwrupr 和有意义的 interval_span 来实现。例如,当下限为 3.3 且上限为 3.6 时,区间跨度为 0.15 是有意义的。

# custom boundaries based on a lower limit and upper limit
my_intervals <- create_intervals(lwr = 3.3, upr = 3.6, interval_span = 0.15)

输出 2

> out
$`3.18-3.30`
      speed acceleration
37 3.238781     2.696456
82 3.258691     2.722076

$`3.30-3.45`
      speed acceleration
11 3.394378     2.583636
19 3.328292     2.711825
73 3.315306     2.644580
83 3.394886     2.580924

$`3.45-3.60`
      speed acceleration
4  3.520530     2.018930
40 3.517329     2.032943
58 3.485247     2.079893
67 3.458031     2.078545

$`3.60-3.76`
      speed acceleration
6  3.635234     1.447290
34 3.688131     1.218969
51 3.615017     1.420020
78 3.628465     1.348873

注意:如果您使用 R <4.1.0

版本,请使用 function(x) 而不是 \(x)