带有大型 WHERE 子句的查询导致 EF6 中的超时异常与 npgsql
Query with large WHERE clause causes timeout exception in EF6 with npgsql
我有一个看起来像这样的查询:
private static IQueryable<MultiframeModule> WhereAllFramesProperties(this IQueryable<MultiframeModule> query, ICollection<Frame> frames)
{
return frames.Aggregate(query, (q, frame) =>
{
return q.Where(p => p.Frames.Any(i => i.FrameData.ShaHash == frame.FrameData.ShaHash));
});
}
MultiframeModule
和 Frame
具有多对多关系。
通过该查询,我想找到一个 MultiframeModule
,它包含我作为参数发送的 frames
集合中的所有帧,为此我检查了 ShaHash
参数。
如果 frames
包含 2 帧,那么生成的 SQL 将是这样的:
SELECT
"Extent1"."MultiframeModuleId",
"Extent1"."FrameIncrementPointer",
"Extent1"."PageNumberVector"
FROM
"public"."MultiframeModule" AS "Extent1"
WHERE
EXISTS
(
SELECT
1 AS "C1"
FROM
"public"."Frame" AS "Extent2"
INNER JOIN
"public"."FrameData" AS "Extent3"
ON "Extent2"."FrameData_FrameDataId" = "Extent3"."FrameDataId"
WHERE
"Extent1"."MultiframeModuleId" = "Extent2"."MultiframeModule_MultiframeModuleId"
AND "Extent3"."ShaHash" = @p__linq__0
)
AND EXISTS
(
SELECT
1 AS "C1"
FROM
"public"."Frame" AS "Extent4"
INNER JOIN
"public"."FrameData" AS "Extent5"
ON "Extent4"."FrameData_FrameDataId" = "Extent5"."FrameDataId"
WHERE
"Extent1"."MultiframeModuleId" = "Extent4"."MultiframeModule_MultiframeModuleId"
AND "Extent5"."ShaHash" = @p__linq__1
)
LIMIT 2
-- p__linq__0: '0' (Type = Int32, IsNullable = false)
-- p__linq__1: '0' (Type = Int32, IsNullable = false)
但是,如果我有更多帧,例如 200,那么调用会抛出异常:
Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
使用堆栈跟踪:
at Npgsql.ReadBuffer.<Ensure>d__27.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlConnector.<DoReadMessage>d__157.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlConnector.<ReadMessage>d__156.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlConnector.<ReadExpecting>d__163`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlDataReader.<NextResult>d__32.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlDataReader.NextResult()
at Npgsql.NpgsqlCommand.<Execute>d__71.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlCommand.<ExecuteDbDataReader>d__92.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1.Dispatch[TTarget,TInterceptionContext,TResult](TTarget target, Func`3 operation, TInterceptionContext interceptionContext, Action`3 executing, Action`3 executed)
at System.Data.Entity.Infrastructure.Interception.DbCommandDispatcher.Reader(DbCommand command, DbCommandInterceptionContext interceptionContext)
at System.Data.Entity.Core.EntityClient.Internal.EntityCommandDefinition.ExecuteStoreCommands(EntityCommand entityCommand, CommandBehavior behavior)
那么,我的查询失败是否有一些明显的原因?我该如何改进才能成功进行查询?
您可以通过在连接字符串中传递 Command Timeout=0
来禁用超时,因为默认值为 30 秒,您的查询可能 运行 太长了,您需要进行优化。
据我所知,问题是由生成的 SQL 查询中的子查询过多引起的。
在我的测试环境中,SqlServer (LocalDB) 只是拒绝执行生成的查询,原因是太复杂。 PostgreSQL 能够在 ~4 分钟内执行它(在将 CommandTimeout
设置为 0 之后)。
解决方案是找到不生成很多子查询的等效构造。我通常在这种情况下使用 计算不同的匹配项并将其与标准计数 方法进行比较。
可以通过两种方式实现。
(1) 这仅适用于 property == valueN
类型的条件。在这种情况下,可以像这样计算不同的匹配项(在伪代码中):
obj.Collection
.Select(elem => elem.Property)
.Distinct()
.Count(value => values.Contains(values))
将其应用于您的示例:
private static IQueryable<MultiframeModule> WhereAllFramesProperties(this IQueryable<MultiframeModule> query, ICollection<Frame> frames)
{
var values = frames.Select(e => e.FrameData.ShaHash);
var count = frames.Count();
return query.Where(p => p.Frames.Select(e => e.FrameData.ShaHash)
.Distinct().Count(v => values.Contains(v)) == count);
}
(2) 这适用于任何类型的条件。在这种情况下,匹配由其索引标识,这需要动态构建一个选择器表达式,如下所示:
Condition0 ? 0 : Condition1 ? 1 : ... ConditionN-1 ? N - 1 : -1
并且不同匹配计数为
obj.Collection
.Select(selector)
.Distinct()
.Count(i => i >= 0)
将其应用于您的示例:
private static IQueryable<MultiframeModule> WhereAllFramesProperties(this IQueryable<MultiframeModule> query, ICollection<Frame> frames)
{
var parameter = Expression.Parameter(typeof(MultiframeModuleFrame), "e");
var body = frames.Select((frame, index) =>
{
Expression<Func<Frame, bool>> predicate = e => e.FrameData.ShaHash == frame.FrameData.ShaHash;
return new
{
Condition = predicate.Body.ReplaceParameter(predicate.Parameters[0], parameter),
Value = Expression.Constant(index)
};
})
.Reverse()
.Aggregate((Expression)Expression.Constant(-1), (next, item) =>
Expression.Condition(item.Condition, item.Value, next));
var selector = Expression.Lambda<Func<Frame, int>>(body, parameter);
var count = frames.Count();
return query.Where(p => p.Frames.AsQueryable().Select(selector)
.Distinct().Count(i => i >= 0) == count);
}
其中 ReplaceParameter
是以下自定义扩展方法:
public static partial class ExpressionUtils
{
public static Expression ReplaceParameter(this Expression expression, ParameterExpression source, Expression target)
{
return new ParameterReplacer { Source = source, Target = target }.Visit(expression);
}
class ParameterReplacer : ExpressionVisitor
{
public ParameterExpression Source;
public Expression Target;
protected override Expression VisitParameter(ParameterExpression node)
{
return node == Source ? Target : base.VisitParameter(node);
}
}
}
生成的 SQL 包含一个巨大的 CASE WHEN
表达式(不幸的是在 WHERE
子句中加倍),但是一个 单个 子查询,并且在 SqlServer 和 PostgreSQL 中被接受并成功执行(在后一种情况下,在与原始测试相同的条件下,不到 2 秒 - 两个表中的 1K 记录,1M 链接,200 个条件)。
您在 where
中有一个 any
子句,也许您会尝试优化它
return frames.Aggregate(query, (q, frame) =>
{
return q.Frames.Any(i => i.FrameData.ShaHash == frame.FrameData.ShaHash));
});
我有一个看起来像这样的查询:
private static IQueryable<MultiframeModule> WhereAllFramesProperties(this IQueryable<MultiframeModule> query, ICollection<Frame> frames)
{
return frames.Aggregate(query, (q, frame) =>
{
return q.Where(p => p.Frames.Any(i => i.FrameData.ShaHash == frame.FrameData.ShaHash));
});
}
MultiframeModule
和 Frame
具有多对多关系。
通过该查询,我想找到一个 MultiframeModule
,它包含我作为参数发送的 frames
集合中的所有帧,为此我检查了 ShaHash
参数。
如果 frames
包含 2 帧,那么生成的 SQL 将是这样的:
SELECT
"Extent1"."MultiframeModuleId",
"Extent1"."FrameIncrementPointer",
"Extent1"."PageNumberVector"
FROM
"public"."MultiframeModule" AS "Extent1"
WHERE
EXISTS
(
SELECT
1 AS "C1"
FROM
"public"."Frame" AS "Extent2"
INNER JOIN
"public"."FrameData" AS "Extent3"
ON "Extent2"."FrameData_FrameDataId" = "Extent3"."FrameDataId"
WHERE
"Extent1"."MultiframeModuleId" = "Extent2"."MultiframeModule_MultiframeModuleId"
AND "Extent3"."ShaHash" = @p__linq__0
)
AND EXISTS
(
SELECT
1 AS "C1"
FROM
"public"."Frame" AS "Extent4"
INNER JOIN
"public"."FrameData" AS "Extent5"
ON "Extent4"."FrameData_FrameDataId" = "Extent5"."FrameDataId"
WHERE
"Extent1"."MultiframeModuleId" = "Extent4"."MultiframeModule_MultiframeModuleId"
AND "Extent5"."ShaHash" = @p__linq__1
)
LIMIT 2
-- p__linq__0: '0' (Type = Int32, IsNullable = false)
-- p__linq__1: '0' (Type = Int32, IsNullable = false)
但是,如果我有更多帧,例如 200,那么调用会抛出异常:
Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
使用堆栈跟踪:
at Npgsql.ReadBuffer.<Ensure>d__27.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlConnector.<DoReadMessage>d__157.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlConnector.<ReadMessage>d__156.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlConnector.<ReadExpecting>d__163`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlDataReader.<NextResult>d__32.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlDataReader.NextResult()
at Npgsql.NpgsqlCommand.<Execute>d__71.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlCommand.<ExecuteDbDataReader>d__92.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1.Dispatch[TTarget,TInterceptionContext,TResult](TTarget target, Func`3 operation, TInterceptionContext interceptionContext, Action`3 executing, Action`3 executed)
at System.Data.Entity.Infrastructure.Interception.DbCommandDispatcher.Reader(DbCommand command, DbCommandInterceptionContext interceptionContext)
at System.Data.Entity.Core.EntityClient.Internal.EntityCommandDefinition.ExecuteStoreCommands(EntityCommand entityCommand, CommandBehavior behavior)
那么,我的查询失败是否有一些明显的原因?我该如何改进才能成功进行查询?
您可以通过在连接字符串中传递 Command Timeout=0
来禁用超时,因为默认值为 30 秒,您的查询可能 运行 太长了,您需要进行优化。
据我所知,问题是由生成的 SQL 查询中的子查询过多引起的。
在我的测试环境中,SqlServer (LocalDB) 只是拒绝执行生成的查询,原因是太复杂。 PostgreSQL 能够在 ~4 分钟内执行它(在将 CommandTimeout
设置为 0 之后)。
解决方案是找到不生成很多子查询的等效构造。我通常在这种情况下使用 计算不同的匹配项并将其与标准计数 方法进行比较。
可以通过两种方式实现。
(1) 这仅适用于 property == valueN
类型的条件。在这种情况下,可以像这样计算不同的匹配项(在伪代码中):
obj.Collection
.Select(elem => elem.Property)
.Distinct()
.Count(value => values.Contains(values))
将其应用于您的示例:
private static IQueryable<MultiframeModule> WhereAllFramesProperties(this IQueryable<MultiframeModule> query, ICollection<Frame> frames)
{
var values = frames.Select(e => e.FrameData.ShaHash);
var count = frames.Count();
return query.Where(p => p.Frames.Select(e => e.FrameData.ShaHash)
.Distinct().Count(v => values.Contains(v)) == count);
}
(2) 这适用于任何类型的条件。在这种情况下,匹配由其索引标识,这需要动态构建一个选择器表达式,如下所示:
Condition0 ? 0 : Condition1 ? 1 : ... ConditionN-1 ? N - 1 : -1
并且不同匹配计数为
obj.Collection
.Select(selector)
.Distinct()
.Count(i => i >= 0)
将其应用于您的示例:
private static IQueryable<MultiframeModule> WhereAllFramesProperties(this IQueryable<MultiframeModule> query, ICollection<Frame> frames)
{
var parameter = Expression.Parameter(typeof(MultiframeModuleFrame), "e");
var body = frames.Select((frame, index) =>
{
Expression<Func<Frame, bool>> predicate = e => e.FrameData.ShaHash == frame.FrameData.ShaHash;
return new
{
Condition = predicate.Body.ReplaceParameter(predicate.Parameters[0], parameter),
Value = Expression.Constant(index)
};
})
.Reverse()
.Aggregate((Expression)Expression.Constant(-1), (next, item) =>
Expression.Condition(item.Condition, item.Value, next));
var selector = Expression.Lambda<Func<Frame, int>>(body, parameter);
var count = frames.Count();
return query.Where(p => p.Frames.AsQueryable().Select(selector)
.Distinct().Count(i => i >= 0) == count);
}
其中 ReplaceParameter
是以下自定义扩展方法:
public static partial class ExpressionUtils
{
public static Expression ReplaceParameter(this Expression expression, ParameterExpression source, Expression target)
{
return new ParameterReplacer { Source = source, Target = target }.Visit(expression);
}
class ParameterReplacer : ExpressionVisitor
{
public ParameterExpression Source;
public Expression Target;
protected override Expression VisitParameter(ParameterExpression node)
{
return node == Source ? Target : base.VisitParameter(node);
}
}
}
生成的 SQL 包含一个巨大的 CASE WHEN
表达式(不幸的是在 WHERE
子句中加倍),但是一个 单个 子查询,并且在 SqlServer 和 PostgreSQL 中被接受并成功执行(在后一种情况下,在与原始测试相同的条件下,不到 2 秒 - 两个表中的 1K 记录,1M 链接,200 个条件)。
您在 where
中有一个 any
子句,也许您会尝试优化它
return frames.Aggregate(query, (q, frame) =>
{
return q.Frames.Any(i => i.FrameData.ShaHash == frame.FrameData.ShaHash));
});