在排序向量上进行多次匹配时定义起始位置会更快吗?
Will defining a start position when doing multiple matches on a sorted vector be faster?
我有一个包含 100 万个整数的向量,按升序排列,还有一个向量包含 1000 个这些整数的子集,也已排序。
什么会更快?如果samplevec变大,第二个版本会不会变快?
samplevec=sort(sample(1:10000000, 1000000))
matchvec=sort(sample(samplevec, 10000))
for (i in matchvec) {
index=match(i, samplevec)
print(index)
}
或
samplevec=sort(sample(1:10000000, 1000000))
matchvec=sort(sample(samplevec, 10000))
previous=1
for (i in matchvec) {
index=match(i, samplevec[previous:length(samplevec)])
previous=index
print(index)
}
基准测试很容易。这里只是两个时间点。随意拉皮条并自动增加时间点的数量。
library(microbenchmark)
set.seed(357)
samplevec = sort(sample(1:1000, 1000))
matchvec = sort(sample(samplevec, 1000))
microbenchmark(
version1 = {
previous=1
for (i in matchvec) {
index=match(i, samplevec[previous:length(samplevec)])
previous=index
}},
version2 = {
for (i in matchvec) {
index = match(i, samplevec)
}}
)
Unit: milliseconds
expr min lq mean median uq
version1 10.619105 10.711438 12.057713 10.811051 12.71902
version2 2.419441 2.487062 2.853868 2.506603 2.56024
这是第二点。这个运行时间有点长。
set.seed(357)
samplevec = sort(sample(1:100000, 100000))
matchvec = sort(sample(samplevec, 100000))
microbenchmark(
version1 = {
previous=1
for (i in matchvec) {
index=match(i, samplevec[previous:length(samplevec)])
previous=index
}},
version2 = {
for (i in matchvec) {
index=match(i, samplevec)
}}
)
Unit: seconds
expr min lq mean median uq
version1 108.96069 109.61137 110.87308 110.70554 111.61337
version2 15.63668 15.71792 16.20434 15.84646 16.07487
我有一个包含 100 万个整数的向量,按升序排列,还有一个向量包含 1000 个这些整数的子集,也已排序。
什么会更快?如果samplevec变大,第二个版本会不会变快?
samplevec=sort(sample(1:10000000, 1000000))
matchvec=sort(sample(samplevec, 10000))
for (i in matchvec) {
index=match(i, samplevec)
print(index)
}
或
samplevec=sort(sample(1:10000000, 1000000))
matchvec=sort(sample(samplevec, 10000))
previous=1
for (i in matchvec) {
index=match(i, samplevec[previous:length(samplevec)])
previous=index
print(index)
}
基准测试很容易。这里只是两个时间点。随意拉皮条并自动增加时间点的数量。
library(microbenchmark)
set.seed(357)
samplevec = sort(sample(1:1000, 1000))
matchvec = sort(sample(samplevec, 1000))
microbenchmark(
version1 = {
previous=1
for (i in matchvec) {
index=match(i, samplevec[previous:length(samplevec)])
previous=index
}},
version2 = {
for (i in matchvec) {
index = match(i, samplevec)
}}
)
Unit: milliseconds
expr min lq mean median uq
version1 10.619105 10.711438 12.057713 10.811051 12.71902
version2 2.419441 2.487062 2.853868 2.506603 2.56024
这是第二点。这个运行时间有点长。
set.seed(357)
samplevec = sort(sample(1:100000, 100000))
matchvec = sort(sample(samplevec, 100000))
microbenchmark(
version1 = {
previous=1
for (i in matchvec) {
index=match(i, samplevec[previous:length(samplevec)])
previous=index
}},
version2 = {
for (i in matchvec) {
index=match(i, samplevec)
}}
)
Unit: seconds
expr min lq mean median uq
version1 108.96069 109.61137 110.87308 110.70554 111.61337
version2 15.63668 15.71792 16.20434 15.84646 16.07487