查找包含给定时间点的间隔的最快方法

Fastest way to find intervals which contain a given point in time

我需要创建基于时间的“时间表结构”,方法如下:

Void  addTask(DateTime startTime, int durationInMinutes, TaskObject myObj)
{
   // add TaskObject to calendar structure
}
List<TaskObject> getRunningTasks (DateTime startTime, DateTime endTime)
{
  //this procedure have to efficiently return list of running tasks in specified time frame
}
List<TaskObject> getRunningTasks (DateTime exactTime)
{
    return getRunningTaks(exactTime,exactTime);
}

我有大约 60k 个 TaskObjects 需要计算,需要在几小时和几分钟内重新计算(getRunningTasks 将被调用大约 400k 次)

现在我使用:

public Dictionary<long, Dictionary<int, Dictionary<int, List< TaskObject>>>> scheduleTable;

scheduleTable[dayInTicks][小时][分钟]

我将所有匹配的任务添加到每小时和每分钟,并安排它们。

来自 DrKoch 的想法

    public class TaskList
    {
        private SortedDictionary<DateTime, TaskObject> startTimes;
        private SortedDictionary<DateTime, TaskObject> endTimes;
        private SortedSet<DateTime> startTimeIndexes;
        private SortedSet<DateTime> endTimeIndexes;
        public TaskList()
        {
            reset();
        }
        public void addTask(TaskObject taskToAdd, DateTime startTime, int durationInMinutes)
        {
            // start time
            while (startTimes.ContainsKey(startTime))
            {
                startTime = startTime.AddTicks(1);
            }
            startTimes.Add(startTime, taskToAdd);
            startTimeIndexes.Add(startTime);
            //end time
            DateTime endTime = startTime.AddMinutes(durationInMinutes);
            while (endTimes.ContainsKey(endTime))
            {
                endTime = endTime.AddTicks(1);
            }
            endTimes.Add(endTime, taskToAdd);
            endTimeIndexes.Add(endTime);
        }
        public List<TaskObject> getRunningTasks(DateTime startTime, DateTime endTime)
        {
            DateTime fromBeginingOfDay = new DateTime(endTime.Year, endTime.Month, endTime.Day);
            SortedSet<DateTime> myEndTimeIndexes =  endTimeIndexes.GetViewBetween(fromBeginingOfDay, startTime); // tasks, that already finished during specified day
            SortedSet<DateTime> myStartTimeIndexes = endTimeIndexes.GetViewBetween(fromBeginingOfDay, endTime);  // tasks, that started from the beginig of the day
            List<TaskObject> result = new List<TaskObject>();
            // Fill result with all matching tasks
            foreach (DateTime myStartTimeIndex in myStartTimeIndexes)
            {
                result.Add(startTimes[myStartTimeIndex]);
            }
            // Remove finished tasks from result
            foreach (DateTime myEndTimeIndex in myEndTimeIndexes)
            {
                if (result.Contains(endTimes[myEndTimeIndex]))
                {
                    result.Remove(startTimes[myEndTimeIndex]);
                }
            }
            return result;
        }
        public List<TaskObject> getRunningTasks(DateTime exactTime)
        {
            return getRunningTasks(exactTime, exactTime.addSeconds(1));
        }
        public void reset()
        {
            startTimes = new SortedDictionary<DateTime, TaskObject>();
            endTimes = new SortedDictionary<DateTime, TaskObject>();
            startTimeIndexes = new SortedSet<DateTime>();
            endTimeIndexes = new SortedSet<DateTime>();
        }
    }
    public class TaskObject
    {
        public string Name;
        public TaskObject(string name)
        {
            Name = name;
        }
    }

备注: 请参阅我的 second answer 以获得更完整的解决方案。


您可以构建两个额外的 sorted 词典:

SortedDictionary<DateTime, Task> startTimes; // startTime -> Task
SortedDictionary<DateTime, Task> endTimes;   // endTime -> Task

这些词典允许快速(O(log N))访问在 exactTime 之前开始并在 exactTime

之后结束的所有任务

这些集合的交集就是您要查找的内容。


更好的collection是SortedSet它有一个

SortedSet<T>.GetViewBetween()

满足您所有需要的方法:它可以 return startTimesSet 中的所有任务,startTime 早于 exactTime

这个问题类似于选择二维点,落在指定的矩形内。不幸的是,它不能直接使用二分查找来解决。

解决这个问题的主要方法:把"plain"占卜成正方形。一个小例子:

// minDate means minimal possible date
// maxDate means maximal possible date
// interval means a unit of division in days, f.e. 10 or 30
var size = (maxDate.Subtract(minDate).Days + interval)/interval;
var tasks = new List<Task>[size, size]();

// for each new task:
var startDate = ...
var endDate = ...
var x = (startDate.Subtract(minDate).Days + interval)/interval;
var y = (endDate.Subtract(minDate).Days + interval)/interval;

if (tasks[x, y] == null)
    tasks[x, y] = new List<Task>();

tasks[x, y].Add(newTask);

// search
var startPeriod = ...
var endPeriod = ...
var minIndex = (startPeriod.Subtract(minDate).Days + interval)/interval;
var maxIndex = (endPeriod.Subtract(minDate).Days + interfal)/interval;

for (int x = minIndex + 1; x < maxIndex - 1; x++)
    for (int y = minIndex + 1; y < maxIndex - 1; y++)
        tasks[x, y] ... // All these tasks are yours

for (int x = minIndex; x < maxIndex; x++)
    foreach(var task in tasks[x, minIndex])
        if (task.startDate >= startPeriod && task.endDate <= endPeriod)
            ... // All these tasks also are yours

// Repeat last for/foreach for every boundary interval, since not all tasks
// may be yours there
...

在边界内 "squares" 你用蛮力寻找所需的任务。如果太慢,可以用SortedList代替List。它会减少暴力破解的时间,但不会完全摆脱它。

假设您将任务存储在 class 中,如下所示:

public class MyTask
{
    public string name;
    public DateTime startDt;
    public DateTime endDt;
    // ...
}

基本思想是维护 两个 任务集合,一个按 startDt 排序,第二个按 endDt 排序。

我们将使用 SortedSet 有两个原因:

  1. 它有一个 computational complexity 的 O(log n) 用于插入和 搜索。如果您遇到很多物品的问题,这是非常可取的 复杂度优于 O(n).

  2. 它允许 return 某个 "range" 中的所有项目。不需要知道 与字典

  3. 中一样用于检索的确切 "keys"

因为 SortedSet 中的所有项目都是唯一的,并且因为多个任务可能具有相同的 startDtendDt 我们不能直接将任务存储在 SortedSet 中相反,我们同时维护所有任务的 "bucket":

public class SameTimeTaskList
{
    public DateTime time; // common start or end time of all tasks in list
    public List<MyTask> taskList = new List<MyTask>();
}

这个的排序标准当然是time

// Defines a comparer to create a sorted set 
// that is sorted by time. 
public class ByTime : IComparer<SameTimeTaskList>
{
    public int Compare(SameTimeTaskList x, SameTimeTaskList y)
    {
        return x.time.CompareTo(y.time);
    }
}

有了这个我们可以构建我们的两个排序集:

SortedSet<SameTimeTaskList> startTimeSet = 
  new SortedSet<SameTimeTaskList>(new ByTime());
SortedSet<SameTimeTaskList> endTimeSet = 
  new SortedSet<SameTimeTaskList>(new ByTime());

两个集合中都插入了一个新任务。如果此 time 的存储桶不存在,则会创建一个新的存储桶。否则,任务将简单地添加到正确的存储桶中:

    public void Add(MyTask task)
    {
        // startTimeSet
        refTime.time = task.startDt;
        var lst = startTimeSet.GetViewBetween(refTime,
            refTime).FirstOrDefault();
        if (lst == null) // no bucket found for time
        {
            lst = new SameTimeTaskList { time = task.startDt };
            startTimeSet.Add(lst);
        }
        lst.taskList.Add(task); // add task to bucket
        // endTimeSet
        refTime.time = task.endDt;
        lst = endTimeSet.GetViewBetween(refTime,
            refTime).FirstOrDefault();
        if (lst == null) // no bucket found for time
        {
            lst = new SameTimeTaskList { time = task.endDt };
            endTimeSet.Add(lst);
        }
        lst.taskList.Add(task); // add task to bucket
    }

现在很容易得到在某个 exactTime 活跃的所有区间。每个任务必须满足两个条件:

task.startDt <= exactTime
&&
task.endDt >= exactTime

我们检查两个 SortedSets 以查看哪个 return 是一个条件下的较小集合。然后我们检查较小集合中的所有任务是否符合第二个条件:

    public IEnumerable<MyTask> Get(DateTime exactTime)
    {
        refTime.time = exactTime;
        // set of all tasks started before exactTime
        SortedSet<SameTimeTaskList> sSet =
           startTimeSet.GetViewBetween(minTime, refTime);
        // set of all tasks ended after exactTime
        SortedSet<SameTimeTaskList> eSet =
           endTimeSet.GetViewBetween(refTime, maxTime);

        List<MyTask> result = new List<MyTask>();
        if (sSet.Count < eSet.Count) // check smaller set for 2nd condition
        {
            foreach (var tl in sSet)
                foreach (MyTask tsk in tl.taskList)
                    if(tsk.endDt >= exactTime) result.Add(tsk);
        }
        else // eSet is smaller
        {
            foreach (var tl in eSet)
                foreach (MyTask tsk in tl.taskList)
                    if (tsk.startDt <= exactTime) result.Add(tsk);

        }
        return result;
    }

这是作为工作程序的完整代码:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace IntervallsTest
{
    class Program
    {
        static void Main(string[] args)
        {
            DateTime exactDate = DateTime.Parse("2015-6-1");

            var tc = new TaskCollection();
            tc.Add(new MyTask { name = "T1", startDt = DateTime.Parse("2015-1-1"), endDt = DateTime.Parse("2015-02-01") });
            tc.Add(new MyTask { name = "T2", startDt = DateTime.Parse("2015-1-1"), endDt = DateTime.Parse("2015-07-01") });
            tc.Add(new MyTask { name = "T2a", startDt = DateTime.Parse("2015-1-1"), endDt = DateTime.Parse("2015-07-02") });
            tc.Add(new MyTask { name = "T3", startDt = DateTime.Parse("2015-05-1"), endDt = DateTime.Parse("2015-12-31") });
            tc.Add(new MyTask { name = "T3a", startDt = DateTime.Parse("2015-04-1"), endDt = DateTime.Parse("2015-12-31") });
            tc.Add(new MyTask { name = "T4", startDt = DateTime.Parse("2015-12-1"), endDt = DateTime.Parse("2015-12-31") });

            var result = tc.Get(exactDate);

            Console.WriteLine("These tasks are active at " + exactDate);
            foreach (var tsk in result)
            {
                Console.WriteLine(tsk.name);
            }
            Console.WriteLine("press any key");
            Console.ReadKey();
        }
    }

    public class TaskCollection
    {
        SortedSet<SameTimeTaskList> startTimeSet = new SortedSet<SameTimeTaskList>(new ByTime());
        SortedSet<SameTimeTaskList> endTimeSet = new SortedSet<SameTimeTaskList>(new ByTime());

        static SameTimeTaskList refTime = new SameTimeTaskList();
        static SameTimeTaskList minTime = new SameTimeTaskList { time = DateTime.MinValue };
        static SameTimeTaskList maxTime = new SameTimeTaskList { time = DateTime.MaxValue };

        public void Add(MyTask task)
        {
            // startTimeSet
            refTime.time = task.startDt;
            var lst = startTimeSet.GetViewBetween(refTime, refTime).FirstOrDefault();
            if (lst == null) // no bucket found for time
            {
                lst = new SameTimeTaskList { time = task.startDt };
                startTimeSet.Add(lst);
            }
            lst.taskList.Add(task); // add task to bucket
            // endTimeSet
            refTime.time = task.endDt;
            lst = endTimeSet.GetViewBetween(refTime, refTime).FirstOrDefault();
            if (lst == null) // no bucket found for time
            {
                lst = new SameTimeTaskList { time = task.endDt };
                endTimeSet.Add(lst);
            }
            lst.taskList.Add(task); // add task to bucket
        }

        public IEnumerable<MyTask> Get(DateTime exactTime)
        {
            refTime.time = exactTime;
            // set of all tasks started before exactTime
            SortedSet<SameTimeTaskList> sSet = startTimeSet.GetViewBetween(minTime, refTime);
            // set of all tasks ended after exactTime
            SortedSet<SameTimeTaskList> eSet = endTimeSet.GetViewBetween(refTime, maxTime);

            List<MyTask> result = new List<MyTask>();
            if (sSet.Count < eSet.Count) // check smaller set for 2nd condition
            {
                foreach (var tl in sSet)
                    foreach (MyTask tsk in tl.taskList)
                        if(tsk.endDt >= exactTime) result.Add(tsk);
            }
            else // eSet is smaller
            {
                foreach (var tl in eSet)
                    foreach (MyTask tsk in tl.taskList)
                        if (tsk.startDt <= exactTime) result.Add(tsk);

            }
            return result;
        }
    }

    public class MyTask
    {
        public string name;
        public DateTime startDt;
        public DateTime endDt;
        // ...
    }

    public class SameTimeTaskList
    {
        public DateTime time; // common start or end time of all tasks in list
        public List<MyTask> taskList = new List<MyTask>();
    }

    // Defines a comparer to create a sorted set 
    // that is sorted by time. 
    public class ByTime : IComparer<SameTimeTaskList>
    {
        public int Compare(SameTimeTaskList x, SameTimeTaskList y)
        {
            return x.time.CompareTo(y.time);
        }
    }
}

当您将此版本与您尝试过的所有其他版本进行比较时,我很想知道基准测试结果。