通过 extending/compressing 将向量归一化为给定长度
Normalize a vector by extending/compressing it to a given length
我有一个包含 122 个值的向量:
vec1 = c(0,0,0,0,0,0,0,0,-0.0029,-0.0029,-0.0029,-0.0029,-0.0029,-0.0029,-0.0044,-0.0044,-0.0059,-0.0073,-0.0073,-0.0088,-0.0088,-0.0102,-0.0132,-0.0176,-0.0249,-0.0293,-0.0322,-0.0337,-0.0337,-0.0337,-0.0337,-0.0337,-0.0337,-0.0351,-0.0425,-0.0512,-0.0586,-0.0659,-0.0703,-0.0805,-0.0937,-0.1127,-0.1347,-0.1508,-0.1581,-0.1611,-0.1669,-0.1684,-0.1698,-0.1698,-0.1698,-0.1698,-0.1552,-0.1362,-0.104,-0.0439,0.0747,0.2035,0.3353,0.4583,0.5695,0.6501,0.7277,0.7687,0.7892,0.8038,0.8097,0.8141,0.8184,0.8214,0.8243,0.8243,0.8053,0.7804,0.6603,0.5066,0.3338,0.1435,-0.1127,-0.41,-0.6442,-0.8097,-0.8858,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9034,-0.8946,-0.8741,-0.8433,-0.8228,-0.8126,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082)
现在我想通过压缩到 100 个值来规范化它,即在这种情况下,vec1 的每 1.22 个值应该由 norm_vec1 的 1 个值表示,如下所示:
norm_vec1 [1] = mean (vec1 [1]) ## (because round(1.22) = 1)
norm_vec1 [2] = mean (vec1 [2]) ## (because round(1.22*2) = 2)
norm_vec1 [3] = mean (vec1 [3:4]) ## (because round(1.22*3) = 4)
norm_vec1 [4] = mean (vec1 [5]) ## (because round(1.22*4) = 5)
等等
因此,我应该在向量 norm_vec1 中得到 100 个值,每个值要么直接取自 vec1,要么是平均的结果,具体取决于它的位置。 vec1 的任何值都不应遗漏。
重要的是,这也适用于小于 100 的向量(例如,63 个元素):
norm_short_vec1 [1] = mean (short_vec1 [1]) ## (because round(0.63*1)=1)
norm_short_vec1 [2] = mean (short_vec1 [1]) ## (because round(0.63*2)=1)
norm_short_vec1 [3] = mean (short_vec1 [2]) ## (because round(0.63*3)=2)
等等
或者,或者,每个向量都可以乘以 100,然后新值可以基于来自这个新的更长向量的样本,就像这样(如果 vec1 有 122 个值):
long_vec1 = c(c(vec1 [1] repeated 100 times), (vec1 [2] repeated 100 times), etc.)
norm_vec1 [1] = mean (long_vec1 [1:122])
norm_vec1 [2] = mean (long_vec1 [123:244])
etc.
有这个功能吗?
compress <- function(x, length.out) {
n <- length(x)
if (n < length.out) stop("length.out is too big")
spl <- round((1:n)/n*length.out)
res <- sapply(split(x, spl), mean)
names(res) <- NULL
res
}
compress(vec1, 100)
#> [1] 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.00145
#> [8] -0.00290 -0.00290 -0.00290 -0.00290 -0.00440 -0.00440 -0.00590
#> [15] -0.00730 -0.00805 -0.00880 -0.01020 -0.01320 -0.02125 -0.02930
#> [22] -0.03220 -0.03370 -0.03370 -0.03370 -0.03370 -0.03370 -0.03510
#> [29] -0.04250 -0.05490 -0.06590 -0.07030 -0.08050 -0.10320 -0.13470
#> [36] -0.15080 -0.15810 -0.16110 -0.16765 -0.16980 -0.16980 -0.16980
#> [43] -0.16250 -0.13620 -0.10400 -0.04390 0.07470 0.26940 0.45830
#> [50] 0.56950 0.65010 0.74820 0.78920 0.80380 0.80970 0.81410
#> [57] 0.81990 0.82430 0.82430 0.80530 0.72035 0.50660 0.33380
#> [64] 0.14350 -0.11270 -0.52710 -0.80970 -0.88580 -0.90920 -0.90920
#> [71] -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90340
#> [78] -0.89460 -0.87410 -0.83305 -0.81260 -0.80820 -0.80820 -0.80820
#> [85] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
#> [92] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
#> [99] -0.80820 -0.80820
这里spl
描述了原始向量的元素和结果向量之间的联系。在此特定示例中,它包含 122 个值:1, 2, 2, 3, 4, ... 99, 100
,这意味着第一个元素将直接进入结果向量,然后对第二个和第三个元素进行平均以填充结果向量的元素 2,依此类推。
UPD
基于您的第二个算法的函数。
normalize <- function(x, length.out) {
n <- length(x)
big_vec <- rep(x, each = length.out)
res <- sapply(split(big_vec, rep(1:length.out, each = n)), mean)
names(res) <- NULL
res
}
反之亦然:
normalize(1:3, length.out = 5)
#> [1] 1.000000 1.333333 2.000000 2.666667 3.000000
这是 Map
的解决方案。它用于在输入向量中创建索引列表,给出要与 mean
.
聚合的向量元素
fun <- function(x, n = 100){
r <- round(seq_len(n)*length(x)/n)
d <- c(0, diff(r))
M <- Map(`:`, (r - d + 1), r)
sapply(M, function(i) mean(x[i]))
}
fun(vec1)
这里是一个函数 compress
,其中包含任何正整数 shortlen
作为对象长度以达到您的压缩目的,其中 split
和 findInterval
用于生成块平均:
compress <- function(v, shortlen){
unname(sapply(split(v,findInterval(seq_along(v),round(length(v)/shortlen*seq_along(v)),left.open = T)),mean))
}
例如:
> compress(vec1,100)
[1] 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.00145 -0.00290 -0.00290 -0.00290 -0.00290 -0.00365 -0.00440 -0.00590 -0.00730 -0.00805 -0.00880
[18] -0.01020 -0.01320 -0.01760 -0.02710 -0.03220 -0.03370 -0.03370 -0.03370 -0.03370 -0.03370 -0.03510 -0.04250 -0.05490 -0.06590 -0.07030 -0.08050 -0.09370
[35] -0.12370 -0.15080 -0.15810 -0.16110 -0.16765 -0.16980 -0.16980 -0.16980 -0.16980 -0.14570 -0.10400 -0.04390 0.07470 0.26940 0.45830 0.56950 0.65010
[52] 0.72770 0.77895 0.80380 0.80970 0.81410 0.81990 0.82430 0.82430 0.80530 0.78040 0.58345 0.33380 0.14350 -0.11270 -0.52710 -0.80970 -0.88580
[69] -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90340 -0.89460 -0.87410 -0.83305 -0.81260 -0.80820 -0.80820 -0.80820 -0.80820
[86] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
> compress(vec1,63)
[1] 0.00000 0.00000 0.00000 0.00000 -0.00290 -0.00290 -0.00290 -0.00440 -0.00515 -0.00730 -0.00880 -0.01170 -0.02125 -0.03075 -0.03370 -0.03370 -0.03370
[18] -0.03880 -0.05490 -0.06810 -0.08710 -0.12370 -0.15445 -0.16110 -0.16765 -0.16980 -0.16980 -0.14570 -0.07395 0.13910 0.39680 0.60980 0.74820 0.79650
[35] 0.81190 0.81990 0.82430 0.79285 0.58345 0.33380 0.01540 -0.52710 -0.84775 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.89900 -0.85870 -0.81770
[52] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
有趣的问题。这是一个选项:
alex_normalize <- function(vec, tol = 100) {
k <- length(vec) / tol
from <- round(k * seq_len(tol))
sapply(
seq_len(tol),
function(i) mean(vec[seq(max(from[i-1L], 1L), from[i])])
)
}
alex_normalize(vec1)
[1] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[7] -0.0009666667 -0.0029000000 -0.0029000000 -0.0029000000 -0.0029000000 -0.0034000000
[13] -0.0044000000 -0.0051500000 -0.0066000000 -0.0078000000 -0.0088000000 -0.0095000000
[19] -0.0117000000 -0.0154000000 -0.0239333333 -0.0307500000 -0.0329500000 -0.0337000000
[25] -0.0337000000 -0.0337000000 -0.0337000000 -0.0344000000 -0.0388000000 -0.0507666667
[31] -0.0622500000 -0.0681000000 -0.0754000000 -0.0871000000 -0.1137000000 -0.1427500000
[37] -0.1544500000 -0.1596000000 -0.1654666667 -0.1691000000 -0.1698000000 -0.1698000000
[43] -0.1698000000 -0.1537333333 -0.1201000000 -0.0739500000 0.0154000000 0.2045000000
[49] 0.3968000000 0.5139000000 0.6098000000 0.6889000000 0.7618666667 0.7965000000
[55] 0.8067500000 0.8119000000 0.8179666667 0.8228500000 0.8243000000 0.8148000000
[61] 0.7928500000 0.6491000000 0.4202000000 0.2386500000 0.0154000000 -0.3889666667
[67] -0.7269500000 -0.8477500000 -0.8975000000 -0.9092000000 -0.9092000000 -0.9092000000
[73] -0.9092000000 -0.9092000000 -0.9092000000 -0.9092000000 -0.9063000000 -0.8990000000
[79] -0.8843500000 -0.8467333333 -0.8177000000 -0.8104000000 -0.8082000000 -0.8082000000
[85] -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000
[91] -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000
[97] -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000
我有一个包含 122 个值的向量:
vec1 = c(0,0,0,0,0,0,0,0,-0.0029,-0.0029,-0.0029,-0.0029,-0.0029,-0.0029,-0.0044,-0.0044,-0.0059,-0.0073,-0.0073,-0.0088,-0.0088,-0.0102,-0.0132,-0.0176,-0.0249,-0.0293,-0.0322,-0.0337,-0.0337,-0.0337,-0.0337,-0.0337,-0.0337,-0.0351,-0.0425,-0.0512,-0.0586,-0.0659,-0.0703,-0.0805,-0.0937,-0.1127,-0.1347,-0.1508,-0.1581,-0.1611,-0.1669,-0.1684,-0.1698,-0.1698,-0.1698,-0.1698,-0.1552,-0.1362,-0.104,-0.0439,0.0747,0.2035,0.3353,0.4583,0.5695,0.6501,0.7277,0.7687,0.7892,0.8038,0.8097,0.8141,0.8184,0.8214,0.8243,0.8243,0.8053,0.7804,0.6603,0.5066,0.3338,0.1435,-0.1127,-0.41,-0.6442,-0.8097,-0.8858,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9034,-0.8946,-0.8741,-0.8433,-0.8228,-0.8126,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082)
现在我想通过压缩到 100 个值来规范化它,即在这种情况下,vec1 的每 1.22 个值应该由 norm_vec1 的 1 个值表示,如下所示:
norm_vec1 [1] = mean (vec1 [1]) ## (because round(1.22) = 1)
norm_vec1 [2] = mean (vec1 [2]) ## (because round(1.22*2) = 2)
norm_vec1 [3] = mean (vec1 [3:4]) ## (because round(1.22*3) = 4)
norm_vec1 [4] = mean (vec1 [5]) ## (because round(1.22*4) = 5)
等等
因此,我应该在向量 norm_vec1 中得到 100 个值,每个值要么直接取自 vec1,要么是平均的结果,具体取决于它的位置。 vec1 的任何值都不应遗漏。 重要的是,这也适用于小于 100 的向量(例如,63 个元素):
norm_short_vec1 [1] = mean (short_vec1 [1]) ## (because round(0.63*1)=1)
norm_short_vec1 [2] = mean (short_vec1 [1]) ## (because round(0.63*2)=1)
norm_short_vec1 [3] = mean (short_vec1 [2]) ## (because round(0.63*3)=2)
等等
或者,或者,每个向量都可以乘以 100,然后新值可以基于来自这个新的更长向量的样本,就像这样(如果 vec1 有 122 个值):
long_vec1 = c(c(vec1 [1] repeated 100 times), (vec1 [2] repeated 100 times), etc.)
norm_vec1 [1] = mean (long_vec1 [1:122])
norm_vec1 [2] = mean (long_vec1 [123:244])
etc.
有这个功能吗?
compress <- function(x, length.out) {
n <- length(x)
if (n < length.out) stop("length.out is too big")
spl <- round((1:n)/n*length.out)
res <- sapply(split(x, spl), mean)
names(res) <- NULL
res
}
compress(vec1, 100)
#> [1] 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.00145
#> [8] -0.00290 -0.00290 -0.00290 -0.00290 -0.00440 -0.00440 -0.00590
#> [15] -0.00730 -0.00805 -0.00880 -0.01020 -0.01320 -0.02125 -0.02930
#> [22] -0.03220 -0.03370 -0.03370 -0.03370 -0.03370 -0.03370 -0.03510
#> [29] -0.04250 -0.05490 -0.06590 -0.07030 -0.08050 -0.10320 -0.13470
#> [36] -0.15080 -0.15810 -0.16110 -0.16765 -0.16980 -0.16980 -0.16980
#> [43] -0.16250 -0.13620 -0.10400 -0.04390 0.07470 0.26940 0.45830
#> [50] 0.56950 0.65010 0.74820 0.78920 0.80380 0.80970 0.81410
#> [57] 0.81990 0.82430 0.82430 0.80530 0.72035 0.50660 0.33380
#> [64] 0.14350 -0.11270 -0.52710 -0.80970 -0.88580 -0.90920 -0.90920
#> [71] -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90340
#> [78] -0.89460 -0.87410 -0.83305 -0.81260 -0.80820 -0.80820 -0.80820
#> [85] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
#> [92] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
#> [99] -0.80820 -0.80820
这里spl
描述了原始向量的元素和结果向量之间的联系。在此特定示例中,它包含 122 个值:1, 2, 2, 3, 4, ... 99, 100
,这意味着第一个元素将直接进入结果向量,然后对第二个和第三个元素进行平均以填充结果向量的元素 2,依此类推。
UPD
基于您的第二个算法的函数。
normalize <- function(x, length.out) {
n <- length(x)
big_vec <- rep(x, each = length.out)
res <- sapply(split(big_vec, rep(1:length.out, each = n)), mean)
names(res) <- NULL
res
}
反之亦然:
normalize(1:3, length.out = 5)
#> [1] 1.000000 1.333333 2.000000 2.666667 3.000000
这是 Map
的解决方案。它用于在输入向量中创建索引列表,给出要与 mean
.
fun <- function(x, n = 100){
r <- round(seq_len(n)*length(x)/n)
d <- c(0, diff(r))
M <- Map(`:`, (r - d + 1), r)
sapply(M, function(i) mean(x[i]))
}
fun(vec1)
这里是一个函数 compress
,其中包含任何正整数 shortlen
作为对象长度以达到您的压缩目的,其中 split
和 findInterval
用于生成块平均:
compress <- function(v, shortlen){
unname(sapply(split(v,findInterval(seq_along(v),round(length(v)/shortlen*seq_along(v)),left.open = T)),mean))
}
例如:
> compress(vec1,100)
[1] 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.00145 -0.00290 -0.00290 -0.00290 -0.00290 -0.00365 -0.00440 -0.00590 -0.00730 -0.00805 -0.00880
[18] -0.01020 -0.01320 -0.01760 -0.02710 -0.03220 -0.03370 -0.03370 -0.03370 -0.03370 -0.03370 -0.03510 -0.04250 -0.05490 -0.06590 -0.07030 -0.08050 -0.09370
[35] -0.12370 -0.15080 -0.15810 -0.16110 -0.16765 -0.16980 -0.16980 -0.16980 -0.16980 -0.14570 -0.10400 -0.04390 0.07470 0.26940 0.45830 0.56950 0.65010
[52] 0.72770 0.77895 0.80380 0.80970 0.81410 0.81990 0.82430 0.82430 0.80530 0.78040 0.58345 0.33380 0.14350 -0.11270 -0.52710 -0.80970 -0.88580
[69] -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90340 -0.89460 -0.87410 -0.83305 -0.81260 -0.80820 -0.80820 -0.80820 -0.80820
[86] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
> compress(vec1,63)
[1] 0.00000 0.00000 0.00000 0.00000 -0.00290 -0.00290 -0.00290 -0.00440 -0.00515 -0.00730 -0.00880 -0.01170 -0.02125 -0.03075 -0.03370 -0.03370 -0.03370
[18] -0.03880 -0.05490 -0.06810 -0.08710 -0.12370 -0.15445 -0.16110 -0.16765 -0.16980 -0.16980 -0.14570 -0.07395 0.13910 0.39680 0.60980 0.74820 0.79650
[35] 0.81190 0.81990 0.82430 0.79285 0.58345 0.33380 0.01540 -0.52710 -0.84775 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.89900 -0.85870 -0.81770
[52] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
有趣的问题。这是一个选项:
alex_normalize <- function(vec, tol = 100) {
k <- length(vec) / tol
from <- round(k * seq_len(tol))
sapply(
seq_len(tol),
function(i) mean(vec[seq(max(from[i-1L], 1L), from[i])])
)
}
alex_normalize(vec1)
[1] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[7] -0.0009666667 -0.0029000000 -0.0029000000 -0.0029000000 -0.0029000000 -0.0034000000
[13] -0.0044000000 -0.0051500000 -0.0066000000 -0.0078000000 -0.0088000000 -0.0095000000
[19] -0.0117000000 -0.0154000000 -0.0239333333 -0.0307500000 -0.0329500000 -0.0337000000
[25] -0.0337000000 -0.0337000000 -0.0337000000 -0.0344000000 -0.0388000000 -0.0507666667
[31] -0.0622500000 -0.0681000000 -0.0754000000 -0.0871000000 -0.1137000000 -0.1427500000
[37] -0.1544500000 -0.1596000000 -0.1654666667 -0.1691000000 -0.1698000000 -0.1698000000
[43] -0.1698000000 -0.1537333333 -0.1201000000 -0.0739500000 0.0154000000 0.2045000000
[49] 0.3968000000 0.5139000000 0.6098000000 0.6889000000 0.7618666667 0.7965000000
[55] 0.8067500000 0.8119000000 0.8179666667 0.8228500000 0.8243000000 0.8148000000
[61] 0.7928500000 0.6491000000 0.4202000000 0.2386500000 0.0154000000 -0.3889666667
[67] -0.7269500000 -0.8477500000 -0.8975000000 -0.9092000000 -0.9092000000 -0.9092000000
[73] -0.9092000000 -0.9092000000 -0.9092000000 -0.9092000000 -0.9063000000 -0.8990000000
[79] -0.8843500000 -0.8467333333 -0.8177000000 -0.8104000000 -0.8082000000 -0.8082000000
[85] -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000
[91] -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000
[97] -0.8082000000 -0.8082000000 -0.8082000000 -0.8082000000