间隔和一组间隔之间的区别?
Difference between an interval and a set of intervals?
我有一组不重叠、不相邻的间隔,例如。 [{10,15}, {30,35}, {20,25}]。它们没有排序,但如果需要我可以排序。
现在我得到了一些新的时间间隔,例如。 {5,32} 并希望生成一组新的间隔来描述差异:这个新间隔所涵盖的范围不在集合中。在这个例子中,答案是:[{5,9}, {16,19}, {26,29}].
计算这个的快速算法是什么?请注意,该集合通常会有 1 个,有时是 2 个,很少有 3 个或更多项目,所以我想针对这种情况进行优化。
对于上下文,这里是最初从开始+结束数据的输入流创建集合的代码,我会在其中合并:
type Interval struct {
start int
end int
}
func (i *Interval) OverlapsOrAdjacent(j Interval) bool {
return i.end+1 >= j.start && j.end+1 >= i.start
}
func (i *Interval) Merge(j Interval) bool {
if !i.OverlapsOrAdjacent(j) {
return false
}
if j.start < i.start {
i.start = j.start
}
if j.end > i.end {
i.end = j.end
}
return true
}
type Intervals []Interval
func (ivs Intervals) Len() int { return len(ivs) }
func (ivs Intervals) Swap(i, j int) { ivs[i], ivs[j] = ivs[j], ivs[i] }
func (ivs Intervals) Less(i, j int) bool { return ivs[i].start < ivs[j].start }
func (ivs Intervals) Merge(iv Interval) Intervals {
ivs = append(ivs, iv)
merged := make(Intervals, 0, len(ivs))
for _, iv := range ivs {
for i := 0; i < len(merged); {
if iv.Merge(merged[i]) {
merged = append(merged[:i], merged[i+1:]...)
} else {
i++
}
}
merged = append(merged, iv)
}
return merged
}
func (ivs Intervals) MergeUsingSort(iv Interval) Intervals {
ivs = append(ivs, iv)
sort.Sort(ivs)
merged := make(Intervals, 0, len(ivs))
merged = append(merged, ivs[0])
for i := 1; i < len(ivs); i++ {
last := len(merged) - 1
if !merged[last].Merge(ivs[i]) {
merged = append(merged, ivs[i])
}
}
return merged
}
func (ivs Intervals) Difference(iv Interval) Intervals {
// ???
return ivs
}
func main() {
var ivs Intervals
for _, input := range inputsFromSomewhere { // in reality, I don't have all these inputs at once, they come in one at a time
iv := Interval{input.start, input.end}
diffs := ivs.Difference(iv) // not yet implemented...
// do something with diffs
ivs = ivs.Merge(iv)
}
}
我发现上面的 Intervals.Merge() 比 MergeUsingSort() 快 2 倍,所以我想知道是否还有一种简单的非排序方式来回答我的问题。
为了回答我自己的问题,这是我的 Difference() 实现,它(在我的输入数据上)比例如更快。 JimB 的建议需要排序。
func (i *Interval) Overlaps(j Interval) bool {
return i.End >= j.Start && j.End >= i.Start
}
func (i *Interval) Difference(j Interval) (left *Interval, right *Interval, overlapped bool) {
if !i.Overlaps(j) {
return
}
overlapped = true
if j.Start < i.Start {
left = &Interval{j.Start, i.Start - 1}
}
if j.End > i.End {
right = &Interval{i.End + 1, j.End}
}
return
}
func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
diffs = append(diffs, iv)
for _, prior := range ivs {
for i := 0; i < len(diffs); {
if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
if len(diffs) == 1 {
diffs = nil
} else {
diffs = append(diffs[:i], diffs[i+1:]...)
}
if left != nil {
diffs = append(diffs, *left)
}
if right != nil {
diffs = append(diffs, *right)
}
} else {
i++
}
}
if len(diffs) == 0 {
break
}
}
return
}
它适用于我试过的数据,虽然我有点担心我可能错过了一个边缘情况,它得到了错误的答案。
问答代码不完整,无法编译。没有基准。快速浏览一下代码,它可能效率低下。
interval.go
和 interval_test.go
的可用代码是从 https://github.com/VertebrateResequencing/wr/tree/develop/minfys 获得的。
让我们从编写区间差异示例的基准开始。
package minfys
import (
"fmt"
"testing"
)
// Example
var (
xA = Intervals{{10, 15}, {30, 35}, {20, 25}}
xB = Interval{5, 32}
xD = Intervals{{5, 9}, {16, 19}, {26, 29}}
xR = Intervals{}
)
func BenchmarkExample(b *testing.B) {
b.ReportAllocs()
a := make(Intervals, len(xA))
b.ResetTimer()
for i := 0; i < b.N; i++ {
copy(a, xA)
xR = a.Difference(xB)
}
b.StopTimer()
if fmt.Sprint(xD) != fmt.Sprint(xR) {
b.Fatal(xD, xR)
}
}
接下来写一个Difference方法。
package minfys
func (a Intervals) Difference(b Interval) Intervals {
// If A and B are sets, then the relative complement of A in B
// is the set of elements in B but not in A.
// The relative complement of A in B is denoted B ∖ A:
// B \ A = {x ∈ A | x ∉ B}
// B \ A = B ∩ A'
//
// For example. d = a\b,
// a: [{10, 15}, {30, 35}, {20, 25}]
// b: {5,32}
// d: [{5,9}, {16,19}, {26,29}]
// The elements of set a are non-overlapping, non-adjacent,
// and unsorted intervals.
if len(a) <= 0 {
return Intervals{b}
}
d := make(Intervals, 0, 3)
for ; len(a) > 0; a = a[1:] {
for i := 1; i < len(a); i++ {
if a[i].Start < a[0].Start {
a[i], a[0] = a[0], a[i]
}
}
if b.Start < a[0].Start {
if b.End < a[0].Start {
d = append(d, b)
break
}
d = append(d, Interval{b.Start, a[0].Start - 1})
b.Start = a[0].Start
}
if b.End <= a[0].End {
break
}
if b.Start <= a[0].End {
b.Start = a[0].End + 1
}
if len(a) == 1 {
if b.Start <= a[0].End {
b.Start = a[0].End + 1
}
d = append(d, b)
break
}
}
return d
}
现在,对 Difference 方法进行基准测试。
BenchmarkExample-4 20000000 62.4 ns/op 48 B/op 1 allocs/op
sbs写了一个Difference方法
// Interval struct is used to describe something with a start and end. End must
// be greater than start.
type Interval struct {
Start int64
End int64
}
// Overlaps returns true if this interval overlaps with the supplied one.
func (i *Interval) Overlaps(j Interval) bool {
// https://nedbatchelder.com/blog/201310/range_overlap_in_two_compares.html
return i.End >= j.Start && j.End >= i.Start
}
// Intervals type is a slice of Interval.
type Intervals []Interval
// Difference returns any portions of iv that do not overlap with any of our
// intervals. Assumes that all of our intervals have been Merge()d in.
func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
diffs = append(diffs, iv)
for _, prior := range ivs {
for i := 0; i < len(diffs); {
if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
if len(diffs) == 1 {
diffs = nil
} else {
diffs = append(diffs[:i], diffs[i+1:]...)
}
if left != nil {
diffs = append(diffs, *left)
}
if right != nil {
diffs = append(diffs, *right)
}
} else {
i++
}
}
if len(diffs) == 0 {
break
}
}
return
}
Benchmark sbs的差分法。
BenchmarkExample-4 5000000 365 ns/op 128 B/op 4 allocs/op
peterSO 的差分法明显更快。
old.txt (sbs) versus new.txt (peterSO):
benchmark old ns/op new ns/op delta
BenchmarkExample-4 365 62.4 -82.90%
benchmark old allocs new allocs delta
BenchmarkExample-4 4 1 -75.00%
benchmark old bytes new bytes delta
BenchmarkExample-4 128 48 -62.50%
这只是一个开始。可能还有其他改进。
interval_test.go
中有一些错误。 ShouldBeNil
用于指针; ShouldBeEmpty
用于 collections。 ShouldResemble
不处理集合相等(包含相同元素的两个集合是同一个集合)。更改 ShouldResemble
顺序以匹配依赖于实现的顺序。
$ go test
..........................................................................................................................x......................................................x................x
Failures:
* interval_test.go
Line 247:
Expected: nil
Actual: '[]'
* interval_test.go
Line 375:
Expected: 'minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:31, End:32}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}}'
Actual: 'minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}, minfys.Interval{Start:31, End:32}}'
(Should resemble)!
* interval_test.go
Line 413:
Expected: 'minfys.Intervals{minfys.Interval{Start:7, End:10}, minfys.Interval{Start:1, End:3}, minfys.Interval{Start:15, End:17}}'
Actual: 'minfys.Intervals{minfys.Interval{Start:1, End:3}, minfys.Interval{Start:7, End:10}, minfys.Interval{Start:15, End:17}}'
(Should resemble)!
195 total assertions
...
198 total assertions
--- FAIL: TestIntervals (0.04s)
FAIL
.
$ diff -a -u ../interval_test.go interval_test.go
--- ../interval_test.go 2017-04-29 20:23:29.365344008 -0400
+++ interval_test.go 2017-04-29 20:54:14.349344903 -0400
@@ -244,19 +244,19 @@
So(len(ivs), ShouldEqual, 1)
newIvs = ivs.Difference(twoSix)
- So(newIvs, ShouldBeNil)
+ So(newIvs, ShouldBeEmpty)
ivs = ivs.Merge(twoSix)
So(len(ivs), ShouldEqual, 1)
newIvs = ivs.Difference(oneThree)
- So(newIvs, ShouldBeNil)
+ So(newIvs, ShouldBeEmpty)
ivs = ivs.Merge(oneThree)
So(len(ivs), ShouldEqual, 1)
oneSeven := Interval{1, 7}
newIvs = ivs.Difference(oneSix)
- So(newIvs, ShouldBeNil)
+ So(newIvs, ShouldBeEmpty)
ivs = ivs.Merge(oneSix)
So(len(ivs), ShouldEqual, 1)
@@ -372,7 +372,7 @@
fiveThirtyTwo := Interval{5, 32}
newIvs = ivs.Difference(fiveThirtyTwo)
- So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{31, 32}, Interval{11, 14}, Interval{19, 19}})
+ So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{11, 14}, Interval{19, 19}, Interval{31, 32}})
ivs = ivs.Merge(fiveThirtyTwo)
So(len(ivs), ShouldEqual, 3)
@@ -409,7 +409,7 @@
ivs = ivs.Truncate(17)
- expected := Intervals{sevenTen, oneThree, Interval{15, 17}}
+ expected := Intervals{oneThree, sevenTen, Interval{15, 17}}
So(ivs, ShouldResemble, expected)
})
})
.
$ go test
.............................................................................................................................................................................................................
205 total assertions
...
208 total assertions
PASS
$
I [@sbs] confirm it's faster than my solution. Though if you just
measure the wall-time that using Difference() takes (put a before :=
time.Now() before the last Difference() call in the interval_test.go,
and a time.Since(before) after it and sum those durations over the
loop), it seems to make surprisingly little difference (on my machine
it takes ~31ms with my solution and ~29ms with yours).
根据要求,interval_test.go
已修改为测量挂钟时间:
$ diff -a -u ../interval_test.go walltime_test.go
--- ../interval_test.go 2017-04-29 20:23:29.365344008 -0400
+++ walltime_test.go 2017-04-30 13:39:29.000000000 -0400
@@ -24,6 +24,7 @@
"math/rand"
"testing"
"time"
+ "fmt"
)
func TestIntervals(t *testing.T) {
@@ -459,16 +460,20 @@
var ivs Intervals
errors := 0
+ var diffTime time.Duration
t := time.Now()
for i, input := range inputs {
iv := NewInterval(int64(input), int64(readSize))
+ before := time.Now()
newIvs := ivs.Difference(iv)
+ diffTime += time.Since(before)
if (len(newIvs) == 1) != exepectedNew[i] {
errors++
}
ivs = ivs.Merge(iv)
}
- // fmt.Printf("\ntook %s\n", time.Since(t))
+ fmt.Printf("took %s\n", time.Since(t))
+ fmt.Printf("\n Difference took %s\n", diffTime)
So(errors, ShouldEqual, 0)
So(len(ivs), ShouldEqual, 1)
So(time.Since(t).Seconds(), ShouldBeLessThan, 1) // 42ms on my machine
$
interval_test.go
基准输入大小和频率为
size frequency
0 1
1 94929
2 50072
3 4998
输出大小和频率为
size frequency
0 50000
1 100000
为此分布调整 peterSo 的差值方法给出
package minfys
func (a Intervals) Difference(b Interval) Intervals {
// If A and B are sets, then the relative complement of A in B
// is the set of elements in B but not in A.
// The relative complement of A in B is denoted B ∖ A:
// B \ A = {x ∈ A | x ∉ B}
// B \ A = B ∩ A'
//
// For example. d = a\b,
// a: [{10, 15}, {30, 35}, {20, 25}]
// b: {5,32}
// d: [{5,9}, {16,19}, {26,29}]
// The elements of set a are non-overlapping, non-adjacent,
// and unsorted intervals.
if len(a) <= 0 {
return Intervals{b}
}
var d Intervals
for ; len(a) > 0; a = a[1:] {
for i := 1; i < len(a); i++ {
if a[i].Start < a[0].Start {
a[i], a[0] = a[0], a[i]
}
}
if b.Start < a[0].Start {
if b.End < a[0].Start {
d = append(d, b)
break
}
d = append(d, Interval{b.Start, a[0].Start - 1})
b.Start = a[0].Start
}
if b.End <= a[0].End {
break
}
if b.Start <= a[0].End {
b.Start = a[0].End + 1
}
if len(a) == 1 {
if b.Start <= a[0].End {
b.Start = a[0].End + 1
}
d = append(d, b)
break
}
}
return d
}
运行 peterSO 和 sbs 的差异方法的 interval_test.go
基准给出
$ go test -v
Merging many intervals is fast took 26.208614ms
Difference took 10.706858ms
和
$ go test -v
Merging many intervals is fast took 30.799216ms
Difference took 14.414488ms
peterSo 的差分法明显快于 sbs 的:10.706858 毫秒对 14.414488 毫秒或负 25.7%。
更新 peterSO 修改后的差异方法的早期示例基准测试结果给出
old.txt (sbs) versus new.txt (peterSO):
benchmark old ns/op new ns/op delta
BenchmarkExample-4 365 221 -39.45%
benchmark old allocs new allocs delta
BenchmarkExample-4 4 3 -25.00%
benchmark old bytes new bytes delta
BenchmarkExample-4 128 112 -12.50%
我有一组不重叠、不相邻的间隔,例如。 [{10,15}, {30,35}, {20,25}]。它们没有排序,但如果需要我可以排序。
现在我得到了一些新的时间间隔,例如。 {5,32} 并希望生成一组新的间隔来描述差异:这个新间隔所涵盖的范围不在集合中。在这个例子中,答案是:[{5,9}, {16,19}, {26,29}].
计算这个的快速算法是什么?请注意,该集合通常会有 1 个,有时是 2 个,很少有 3 个或更多项目,所以我想针对这种情况进行优化。
对于上下文,这里是最初从开始+结束数据的输入流创建集合的代码,我会在其中合并:
type Interval struct {
start int
end int
}
func (i *Interval) OverlapsOrAdjacent(j Interval) bool {
return i.end+1 >= j.start && j.end+1 >= i.start
}
func (i *Interval) Merge(j Interval) bool {
if !i.OverlapsOrAdjacent(j) {
return false
}
if j.start < i.start {
i.start = j.start
}
if j.end > i.end {
i.end = j.end
}
return true
}
type Intervals []Interval
func (ivs Intervals) Len() int { return len(ivs) }
func (ivs Intervals) Swap(i, j int) { ivs[i], ivs[j] = ivs[j], ivs[i] }
func (ivs Intervals) Less(i, j int) bool { return ivs[i].start < ivs[j].start }
func (ivs Intervals) Merge(iv Interval) Intervals {
ivs = append(ivs, iv)
merged := make(Intervals, 0, len(ivs))
for _, iv := range ivs {
for i := 0; i < len(merged); {
if iv.Merge(merged[i]) {
merged = append(merged[:i], merged[i+1:]...)
} else {
i++
}
}
merged = append(merged, iv)
}
return merged
}
func (ivs Intervals) MergeUsingSort(iv Interval) Intervals {
ivs = append(ivs, iv)
sort.Sort(ivs)
merged := make(Intervals, 0, len(ivs))
merged = append(merged, ivs[0])
for i := 1; i < len(ivs); i++ {
last := len(merged) - 1
if !merged[last].Merge(ivs[i]) {
merged = append(merged, ivs[i])
}
}
return merged
}
func (ivs Intervals) Difference(iv Interval) Intervals {
// ???
return ivs
}
func main() {
var ivs Intervals
for _, input := range inputsFromSomewhere { // in reality, I don't have all these inputs at once, they come in one at a time
iv := Interval{input.start, input.end}
diffs := ivs.Difference(iv) // not yet implemented...
// do something with diffs
ivs = ivs.Merge(iv)
}
}
我发现上面的 Intervals.Merge() 比 MergeUsingSort() 快 2 倍,所以我想知道是否还有一种简单的非排序方式来回答我的问题。
为了回答我自己的问题,这是我的 Difference() 实现,它(在我的输入数据上)比例如更快。 JimB 的建议需要排序。
func (i *Interval) Overlaps(j Interval) bool {
return i.End >= j.Start && j.End >= i.Start
}
func (i *Interval) Difference(j Interval) (left *Interval, right *Interval, overlapped bool) {
if !i.Overlaps(j) {
return
}
overlapped = true
if j.Start < i.Start {
left = &Interval{j.Start, i.Start - 1}
}
if j.End > i.End {
right = &Interval{i.End + 1, j.End}
}
return
}
func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
diffs = append(diffs, iv)
for _, prior := range ivs {
for i := 0; i < len(diffs); {
if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
if len(diffs) == 1 {
diffs = nil
} else {
diffs = append(diffs[:i], diffs[i+1:]...)
}
if left != nil {
diffs = append(diffs, *left)
}
if right != nil {
diffs = append(diffs, *right)
}
} else {
i++
}
}
if len(diffs) == 0 {
break
}
}
return
}
它适用于我试过的数据,虽然我有点担心我可能错过了一个边缘情况,它得到了错误的答案。
问答代码不完整,无法编译。没有基准。快速浏览一下代码,它可能效率低下。
interval.go
和 interval_test.go
的可用代码是从 https://github.com/VertebrateResequencing/wr/tree/develop/minfys 获得的。
让我们从编写区间差异示例的基准开始。
package minfys
import (
"fmt"
"testing"
)
// Example
var (
xA = Intervals{{10, 15}, {30, 35}, {20, 25}}
xB = Interval{5, 32}
xD = Intervals{{5, 9}, {16, 19}, {26, 29}}
xR = Intervals{}
)
func BenchmarkExample(b *testing.B) {
b.ReportAllocs()
a := make(Intervals, len(xA))
b.ResetTimer()
for i := 0; i < b.N; i++ {
copy(a, xA)
xR = a.Difference(xB)
}
b.StopTimer()
if fmt.Sprint(xD) != fmt.Sprint(xR) {
b.Fatal(xD, xR)
}
}
接下来写一个Difference方法。
package minfys
func (a Intervals) Difference(b Interval) Intervals {
// If A and B are sets, then the relative complement of A in B
// is the set of elements in B but not in A.
// The relative complement of A in B is denoted B ∖ A:
// B \ A = {x ∈ A | x ∉ B}
// B \ A = B ∩ A'
//
// For example. d = a\b,
// a: [{10, 15}, {30, 35}, {20, 25}]
// b: {5,32}
// d: [{5,9}, {16,19}, {26,29}]
// The elements of set a are non-overlapping, non-adjacent,
// and unsorted intervals.
if len(a) <= 0 {
return Intervals{b}
}
d := make(Intervals, 0, 3)
for ; len(a) > 0; a = a[1:] {
for i := 1; i < len(a); i++ {
if a[i].Start < a[0].Start {
a[i], a[0] = a[0], a[i]
}
}
if b.Start < a[0].Start {
if b.End < a[0].Start {
d = append(d, b)
break
}
d = append(d, Interval{b.Start, a[0].Start - 1})
b.Start = a[0].Start
}
if b.End <= a[0].End {
break
}
if b.Start <= a[0].End {
b.Start = a[0].End + 1
}
if len(a) == 1 {
if b.Start <= a[0].End {
b.Start = a[0].End + 1
}
d = append(d, b)
break
}
}
return d
}
现在,对 Difference 方法进行基准测试。
BenchmarkExample-4 20000000 62.4 ns/op 48 B/op 1 allocs/op
sbs写了一个Difference方法
// Interval struct is used to describe something with a start and end. End must
// be greater than start.
type Interval struct {
Start int64
End int64
}
// Overlaps returns true if this interval overlaps with the supplied one.
func (i *Interval) Overlaps(j Interval) bool {
// https://nedbatchelder.com/blog/201310/range_overlap_in_two_compares.html
return i.End >= j.Start && j.End >= i.Start
}
// Intervals type is a slice of Interval.
type Intervals []Interval
// Difference returns any portions of iv that do not overlap with any of our
// intervals. Assumes that all of our intervals have been Merge()d in.
func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
diffs = append(diffs, iv)
for _, prior := range ivs {
for i := 0; i < len(diffs); {
if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
if len(diffs) == 1 {
diffs = nil
} else {
diffs = append(diffs[:i], diffs[i+1:]...)
}
if left != nil {
diffs = append(diffs, *left)
}
if right != nil {
diffs = append(diffs, *right)
}
} else {
i++
}
}
if len(diffs) == 0 {
break
}
}
return
}
Benchmark sbs的差分法。
BenchmarkExample-4 5000000 365 ns/op 128 B/op 4 allocs/op
peterSO 的差分法明显更快。
old.txt (sbs) versus new.txt (peterSO):
benchmark old ns/op new ns/op delta
BenchmarkExample-4 365 62.4 -82.90%
benchmark old allocs new allocs delta
BenchmarkExample-4 4 1 -75.00%
benchmark old bytes new bytes delta
BenchmarkExample-4 128 48 -62.50%
这只是一个开始。可能还有其他改进。
interval_test.go
中有一些错误。 ShouldBeNil
用于指针; ShouldBeEmpty
用于 collections。 ShouldResemble
不处理集合相等(包含相同元素的两个集合是同一个集合)。更改 ShouldResemble
顺序以匹配依赖于实现的顺序。
$ go test
..........................................................................................................................x......................................................x................x
Failures:
* interval_test.go
Line 247:
Expected: nil
Actual: '[]'
* interval_test.go
Line 375:
Expected: 'minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:31, End:32}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}}'
Actual: 'minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}, minfys.Interval{Start:31, End:32}}'
(Should resemble)!
* interval_test.go
Line 413:
Expected: 'minfys.Intervals{minfys.Interval{Start:7, End:10}, minfys.Interval{Start:1, End:3}, minfys.Interval{Start:15, End:17}}'
Actual: 'minfys.Intervals{minfys.Interval{Start:1, End:3}, minfys.Interval{Start:7, End:10}, minfys.Interval{Start:15, End:17}}'
(Should resemble)!
195 total assertions
...
198 total assertions
--- FAIL: TestIntervals (0.04s)
FAIL
.
$ diff -a -u ../interval_test.go interval_test.go
--- ../interval_test.go 2017-04-29 20:23:29.365344008 -0400
+++ interval_test.go 2017-04-29 20:54:14.349344903 -0400
@@ -244,19 +244,19 @@
So(len(ivs), ShouldEqual, 1)
newIvs = ivs.Difference(twoSix)
- So(newIvs, ShouldBeNil)
+ So(newIvs, ShouldBeEmpty)
ivs = ivs.Merge(twoSix)
So(len(ivs), ShouldEqual, 1)
newIvs = ivs.Difference(oneThree)
- So(newIvs, ShouldBeNil)
+ So(newIvs, ShouldBeEmpty)
ivs = ivs.Merge(oneThree)
So(len(ivs), ShouldEqual, 1)
oneSeven := Interval{1, 7}
newIvs = ivs.Difference(oneSix)
- So(newIvs, ShouldBeNil)
+ So(newIvs, ShouldBeEmpty)
ivs = ivs.Merge(oneSix)
So(len(ivs), ShouldEqual, 1)
@@ -372,7 +372,7 @@
fiveThirtyTwo := Interval{5, 32}
newIvs = ivs.Difference(fiveThirtyTwo)
- So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{31, 32}, Interval{11, 14}, Interval{19, 19}})
+ So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{11, 14}, Interval{19, 19}, Interval{31, 32}})
ivs = ivs.Merge(fiveThirtyTwo)
So(len(ivs), ShouldEqual, 3)
@@ -409,7 +409,7 @@
ivs = ivs.Truncate(17)
- expected := Intervals{sevenTen, oneThree, Interval{15, 17}}
+ expected := Intervals{oneThree, sevenTen, Interval{15, 17}}
So(ivs, ShouldResemble, expected)
})
})
.
$ go test
.............................................................................................................................................................................................................
205 total assertions
...
208 total assertions
PASS
$
I [@sbs] confirm it's faster than my solution. Though if you just measure the wall-time that using Difference() takes (put a before := time.Now() before the last Difference() call in the interval_test.go, and a time.Since(before) after it and sum those durations over the loop), it seems to make surprisingly little difference (on my machine it takes ~31ms with my solution and ~29ms with yours).
根据要求,interval_test.go
已修改为测量挂钟时间:
$ diff -a -u ../interval_test.go walltime_test.go
--- ../interval_test.go 2017-04-29 20:23:29.365344008 -0400
+++ walltime_test.go 2017-04-30 13:39:29.000000000 -0400
@@ -24,6 +24,7 @@
"math/rand"
"testing"
"time"
+ "fmt"
)
func TestIntervals(t *testing.T) {
@@ -459,16 +460,20 @@
var ivs Intervals
errors := 0
+ var diffTime time.Duration
t := time.Now()
for i, input := range inputs {
iv := NewInterval(int64(input), int64(readSize))
+ before := time.Now()
newIvs := ivs.Difference(iv)
+ diffTime += time.Since(before)
if (len(newIvs) == 1) != exepectedNew[i] {
errors++
}
ivs = ivs.Merge(iv)
}
- // fmt.Printf("\ntook %s\n", time.Since(t))
+ fmt.Printf("took %s\n", time.Since(t))
+ fmt.Printf("\n Difference took %s\n", diffTime)
So(errors, ShouldEqual, 0)
So(len(ivs), ShouldEqual, 1)
So(time.Since(t).Seconds(), ShouldBeLessThan, 1) // 42ms on my machine
$
interval_test.go
基准输入大小和频率为
size frequency
0 1
1 94929
2 50072
3 4998
输出大小和频率为
size frequency
0 50000
1 100000
为此分布调整 peterSo 的差值方法给出
package minfys
func (a Intervals) Difference(b Interval) Intervals {
// If A and B are sets, then the relative complement of A in B
// is the set of elements in B but not in A.
// The relative complement of A in B is denoted B ∖ A:
// B \ A = {x ∈ A | x ∉ B}
// B \ A = B ∩ A'
//
// For example. d = a\b,
// a: [{10, 15}, {30, 35}, {20, 25}]
// b: {5,32}
// d: [{5,9}, {16,19}, {26,29}]
// The elements of set a are non-overlapping, non-adjacent,
// and unsorted intervals.
if len(a) <= 0 {
return Intervals{b}
}
var d Intervals
for ; len(a) > 0; a = a[1:] {
for i := 1; i < len(a); i++ {
if a[i].Start < a[0].Start {
a[i], a[0] = a[0], a[i]
}
}
if b.Start < a[0].Start {
if b.End < a[0].Start {
d = append(d, b)
break
}
d = append(d, Interval{b.Start, a[0].Start - 1})
b.Start = a[0].Start
}
if b.End <= a[0].End {
break
}
if b.Start <= a[0].End {
b.Start = a[0].End + 1
}
if len(a) == 1 {
if b.Start <= a[0].End {
b.Start = a[0].End + 1
}
d = append(d, b)
break
}
}
return d
}
运行 peterSO 和 sbs 的差异方法的 interval_test.go
基准给出
$ go test -v
Merging many intervals is fast took 26.208614ms
Difference took 10.706858ms
和
$ go test -v
Merging many intervals is fast took 30.799216ms
Difference took 14.414488ms
peterSo 的差分法明显快于 sbs 的:10.706858 毫秒对 14.414488 毫秒或负 25.7%。
更新 peterSO 修改后的差异方法的早期示例基准测试结果给出
old.txt (sbs) versus new.txt (peterSO):
benchmark old ns/op new ns/op delta
BenchmarkExample-4 365 221 -39.45%
benchmark old allocs new allocs delta
BenchmarkExample-4 4 3 -25.00%
benchmark old bytes new bytes delta
BenchmarkExample-4 128 112 -12.50%