带有自定义 class 键的慢字典
Slow dictionary with custom class key
我有一个自定义 class,我试图将其用作字典的键:
// I tried setting more than enough capacity also...
var dict = new Dictionary<MyPoint, MyPoint>(capacity);
现在让我说清楚,这里的目标是比较两个相似但不同的列表,使用 X
、Y
和 Date
作为复合键。这两个列表的值会有所不同,我正在尝试快速比较它们并计算它们的差异。
这里是 class 代码:
public class MyPoint : IEquatable<MyPoint>
{
public short X { get; set; }
public short Y { get; set; }
public DateTime Date { get; set; }
public double MyValue { get; set; }
public override bool Equals(object obj)
{
return base.Equals(obj as MyPoint);
}
public bool Equals(MyPoint other)
{
if (other == null)
{
return false;
}
return (Date == other.Date)
&& (X == other.X)
&& (Y == other.Y);
}
public override int GetHashCode()
{
return Date.GetHashCode()
| X.GetHashCode()
| Y.GetHashCode();
}
}
我也试过用结构键控:
public struct MyPointKey
{
public short X;
public short Y;
public DateTime Date;
// The value is not on these, because the struct is only used as key
}
在这两种情况下,字典编写都非常非常慢(阅读很快)。
我将密钥更改为字符串,格式为:
var dict = new Dictionary<string, MyPoint>(capacity);
var key = string.Format("{0}_{1}", item.X, item.Y);
我对它的速度如此之快感到惊讶 -- 它至少快了 10 倍。 Release模式,没有调试器,能想到的场景我都试过了。
该词典将包含 350,000 或更多项,因此性能很重要。
有什么想法或建议吗?谢谢!
另一个编辑...
我正在尝试以最快的方式比较两份清单。这就是我正在使用的。字典对于快速查找源列表很重要。
IList<MyThing> sourceList;
IDictionary<MyThing, MyThing> comparisonDict;
Parallel.ForEach(sourceList,
sourceItem =>
{
double compareValue = 0;
MyThing compareMatch = null;
if (comparisonDict.TryGetValue(sourceItem, out compareMatch))
{
compareValue = compareMatch.MyValue;
}
// Do a delta check on the item
double difference = sourceItem.MyValue- compareValue;
if (Math.Abs(difference) > 1)
{
// Record the difference...
}
});
如果我没听错,你喜欢使用集合,同时仍然保持键的顺序。在这种情况下,取 SortedSet`1
代替。
代码:
class Program {
static void Main(string[] args) {
SortedSet<MyKey> list = new SortedSet<MyKey>() {
new MyKey(0, 0, new DateTime(2015, 6, 4)),
new MyKey(0, 1, new DateTime(2015, 6, 3)),
new MyKey(1, 1, new DateTime(2015, 6, 3)),
new MyKey(0, 0, new DateTime(2015, 6, 3)),
new MyKey(1, 0, new DateTime(2015, 6, 3)),
};
foreach(var entry in list) {
Console.WriteLine(string.Join(", ", entry.X, entry.Y, entry.Date));
}
Console.ReadKey();
}
}
我将你的 MyPoint
class 更改如下:
public sealed class MyKey : IEquatable<MyKey>, IComparable<MyKey> {
public readonly short X;
public readonly short Y;
public readonly DateTime Date;
public MyKey(short x, short y, DateTime date) {
this.X = x;
this.Y = y;
this.Date = date;
}
public override bool Equals(object that) {
return this.Equals(that as MyKey);
}
public bool Equals(MyKey that) {
if(that == null) {
return false;
}
return this.Date == that.Date
&& this.X == that.X
&& this.Y == that.Y;
}
public static bool operator ==(MyKey lhs, MyKey rhs) {
return lhs != null ? lhs.Equals(rhs) : rhs == null;
}
public static bool operator !=(MyKey lhs, MyKey rhs) {
return lhs != null ? !lhs.Equals(rhs) : rhs != null;
}
public override int GetHashCode() {
int result;
unchecked {
result = (int)X;
result = 31 * result + (int)Y;
result = 31 * result + Date.GetHashCode();
}
return result;
}
public int CompareTo(MyKey that) {
int result = this.X.CompareTo(that.X);
if(result != 0) {
return result;
}
result = this.Y.CompareTo(that.Y);
if(result != 0) {
return result;
}
result = this.Date.CompareTo(that.Date);
return result;
}
}
输出:
0, 0, 03.06.2015 00:00:00
0, 0, 04.06.2015 00:00:00
0, 1, 03.06.2015 00:00:00
1, 0, 03.06.2015 00:00:00
1, 1, 03.06.2015 00:00:00
正如其他人在评论中所说,问题出在您的 GetHashCode()
实施中。使用您的代码,运行 10,000,000 次使用字符串键的迭代需要 11-12 秒。 运行 你现有的 hashCode 我在一分钟后停止了它。使用以下 hashCode 实现花费了不到 5 秒。
public override int GetHashCode()
{
var hashCode = Date.GetHashCode();
hashCode = (hashCode * 37) ^ X.GetHashCode();
hashCode = (hashCode * 37) ^ Y.GetHashCode();
return hashCode;
}
问题是,当你进入大量项目时,由于 OR
s,所有项目都在同一个桶中发生碰撞。一切都在同一个桶中的字典只是一个列表。
我有一个自定义 class,我试图将其用作字典的键:
// I tried setting more than enough capacity also...
var dict = new Dictionary<MyPoint, MyPoint>(capacity);
现在让我说清楚,这里的目标是比较两个相似但不同的列表,使用 X
、Y
和 Date
作为复合键。这两个列表的值会有所不同,我正在尝试快速比较它们并计算它们的差异。
这里是 class 代码:
public class MyPoint : IEquatable<MyPoint>
{
public short X { get; set; }
public short Y { get; set; }
public DateTime Date { get; set; }
public double MyValue { get; set; }
public override bool Equals(object obj)
{
return base.Equals(obj as MyPoint);
}
public bool Equals(MyPoint other)
{
if (other == null)
{
return false;
}
return (Date == other.Date)
&& (X == other.X)
&& (Y == other.Y);
}
public override int GetHashCode()
{
return Date.GetHashCode()
| X.GetHashCode()
| Y.GetHashCode();
}
}
我也试过用结构键控:
public struct MyPointKey
{
public short X;
public short Y;
public DateTime Date;
// The value is not on these, because the struct is only used as key
}
在这两种情况下,字典编写都非常非常慢(阅读很快)。
我将密钥更改为字符串,格式为:
var dict = new Dictionary<string, MyPoint>(capacity);
var key = string.Format("{0}_{1}", item.X, item.Y);
我对它的速度如此之快感到惊讶 -- 它至少快了 10 倍。 Release模式,没有调试器,能想到的场景我都试过了。
该词典将包含 350,000 或更多项,因此性能很重要。
有什么想法或建议吗?谢谢!
另一个编辑...
我正在尝试以最快的方式比较两份清单。这就是我正在使用的。字典对于快速查找源列表很重要。
IList<MyThing> sourceList;
IDictionary<MyThing, MyThing> comparisonDict;
Parallel.ForEach(sourceList,
sourceItem =>
{
double compareValue = 0;
MyThing compareMatch = null;
if (comparisonDict.TryGetValue(sourceItem, out compareMatch))
{
compareValue = compareMatch.MyValue;
}
// Do a delta check on the item
double difference = sourceItem.MyValue- compareValue;
if (Math.Abs(difference) > 1)
{
// Record the difference...
}
});
如果我没听错,你喜欢使用集合,同时仍然保持键的顺序。在这种情况下,取 SortedSet`1
代替。
代码:
class Program {
static void Main(string[] args) {
SortedSet<MyKey> list = new SortedSet<MyKey>() {
new MyKey(0, 0, new DateTime(2015, 6, 4)),
new MyKey(0, 1, new DateTime(2015, 6, 3)),
new MyKey(1, 1, new DateTime(2015, 6, 3)),
new MyKey(0, 0, new DateTime(2015, 6, 3)),
new MyKey(1, 0, new DateTime(2015, 6, 3)),
};
foreach(var entry in list) {
Console.WriteLine(string.Join(", ", entry.X, entry.Y, entry.Date));
}
Console.ReadKey();
}
}
我将你的 MyPoint
class 更改如下:
public sealed class MyKey : IEquatable<MyKey>, IComparable<MyKey> {
public readonly short X;
public readonly short Y;
public readonly DateTime Date;
public MyKey(short x, short y, DateTime date) {
this.X = x;
this.Y = y;
this.Date = date;
}
public override bool Equals(object that) {
return this.Equals(that as MyKey);
}
public bool Equals(MyKey that) {
if(that == null) {
return false;
}
return this.Date == that.Date
&& this.X == that.X
&& this.Y == that.Y;
}
public static bool operator ==(MyKey lhs, MyKey rhs) {
return lhs != null ? lhs.Equals(rhs) : rhs == null;
}
public static bool operator !=(MyKey lhs, MyKey rhs) {
return lhs != null ? !lhs.Equals(rhs) : rhs != null;
}
public override int GetHashCode() {
int result;
unchecked {
result = (int)X;
result = 31 * result + (int)Y;
result = 31 * result + Date.GetHashCode();
}
return result;
}
public int CompareTo(MyKey that) {
int result = this.X.CompareTo(that.X);
if(result != 0) {
return result;
}
result = this.Y.CompareTo(that.Y);
if(result != 0) {
return result;
}
result = this.Date.CompareTo(that.Date);
return result;
}
}
输出:
0, 0, 03.06.2015 00:00:00
0, 0, 04.06.2015 00:00:00
0, 1, 03.06.2015 00:00:00
1, 0, 03.06.2015 00:00:00
1, 1, 03.06.2015 00:00:00
正如其他人在评论中所说,问题出在您的 GetHashCode()
实施中。使用您的代码,运行 10,000,000 次使用字符串键的迭代需要 11-12 秒。 运行 你现有的 hashCode 我在一分钟后停止了它。使用以下 hashCode 实现花费了不到 5 秒。
public override int GetHashCode()
{
var hashCode = Date.GetHashCode();
hashCode = (hashCode * 37) ^ X.GetHashCode();
hashCode = (hashCode * 37) ^ Y.GetHashCode();
return hashCode;
}
问题是,当你进入大量项目时,由于 OR
s,所有项目都在同一个桶中发生碰撞。一切都在同一个桶中的字典只是一个列表。