Entity Framework 6 - Group by then Order by the First() 花费的时间太长
Entity Framework 6 - Group by then Order by the First() takes too long
我真的需要这方面的帮助,但经过数小时的搜索后仍找不到相关答案。
MySQL,Entity Framework6、数据库有几百万条记录,记录看起来像:
Indexint(11) NOT NULL
TaskIDint(11) NOT NULL
DeviceIDbigint(20) NOT NULL
Commentslongtext NULL
ExtendedResultslongtext NULL
RunResultint(11) NOT NULL
JobResultint(11) NOT NULL
JobResultValuedouble NOT NULL
ReporterIDbigint(20) NOT NULL
FieldIDbigint(20) NOT NULL
TimeOfRundatetime NOT NULL
我需要的是获取特定taskID的所有记录,然后按DeviceID分组并按TimeOfRun排序,以便获取特定taskID中每个deviceID的最新数据。
这是我的代码:
List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).
Select(x => x.OrderByDescending(y => y.TimeOfRun).FirstOrDefault()).ToList();
但这是生成的查询:
{SELECT
`Apply1`.`Index`,
`Apply1`.`TaskID`,
`Apply1`.`DEVICEID1` AS `DeviceID`,
`Apply1`.`RunResult`,
`Apply1`.`JobResult`,
`Apply1`.`JobResultValue`,
`Apply1`.`ExtendedResults`,
`Apply1`.`Comments`,
`Apply1`.`ReporterID`,
`Apply1`.`FieldID`,
`Apply1`.`TimeOfRun`
FROM (SELECT
`Project2`.`p__linq__0`,
`Project2`.`DeviceID`,
(SELECT
`Project3`.`Index`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Index`,
(SELECT
`Project3`.`TaskID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TaskID`,
(SELECT
`Project3`.`DeviceID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `DEVICEID1`,
(SELECT
`Project3`.`RunResult`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `RunResult`,
(SELECT
`Project3`.`JobResult`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResult`,
(SELECT
`Project3`.`JobResultValue`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResultValue`,
(SELECT
`Project3`.`ExtendedResults`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ExtendedResults`,
(SELECT
`Project3`.`Comments`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Comments`,
(SELECT
`Project3`.`ReporterID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ReporterID`,
(SELECT
`Project3`.`FieldID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `FieldID`,
(SELECT
`Project3`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TimeOfRun`
FROM (SELECT
@p__linq__0 AS `p__linq__0`,
`Distinct1`.`DeviceID`
FROM (SELECT DISTINCT
`Extent1`.`DeviceID`
FROM `JobsRecordHistories` AS `Extent1`
WHERE `Extent1`.`TaskID` = @p__linq__0) AS `Distinct1`) AS `Project2`) AS `Apply1`}
这花费的时间太长了。
我不知道 SQL 是否足够好,我承认,但是如果在 WHERE 语句之后插入一个 ToList(),那么我会更快地得到结果,尽管这仍然不是正确的做法,因为有很多在这种情况下数据库传递给我的应用程序的不需要的数据,它仍然很慢 = 40k 记录需要 30 秒。
我也试过这个:
Dictionary<long, DateTime> DeviceIDAndTime = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID)
.Select(g => new DeviceIDaAndTime { deviceID = g.Key, timeOfRun = g.Max(gi => gi.TimeOfRun) }).ToDictionary(x => x.deviceID, x => x.timeOfRun);
为了这样使用字典:
List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => DeviceIDAndTime.Keys.Contains(x.DeviceID) && x.TimeOfRun == DeviceIDAndTime[x.DeviceID]).ToList();
但是我得到这个错误:
Additional information: LINQ to Entities does not recognize the method 'System.DateTime get_Item(Int64)' method, and this method cannot be translated into a store expression.
根据我的理解,这是有道理的,当将 timeOfRun 与字典值进行比较时,LINQ 在编写查询时需要一个特定的值而不是一个集合。
奇怪的是我没有找到任何相关的 post 而且其他人也没有遇到这个问题。我想我错过了什么。
感谢任何帮助,谢谢
给出查询语法而不是基于镜头的方法。
我没有在本地对此进行测试,但您可能会看到改进的 sql 代。
或者至少这种方法可能会引导您走上正确的道路
using System;
using System.Data.Entity;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace EF.CodeFirst
{
[TestClass]
public class UnitTest1
{
[TestMethod]
public void TestMethod1()
{
using (var db = new TestDbContext())
{
var taskId = 1;
var query = from job in db.JobRecordHistories
where job.TaskId == taskId
orderby job.TimeOfRun descending
group job by job.DeviceId
into deviceGroup
select deviceGroup;
foreach (var deviceGroup in query)
{
foreach (var jobRecordHistory in deviceGroup)
{
Console.WriteLine("DeviceId '{0}', TaskId'{1}' Runtime'{2}'", jobRecordHistory.DeviceId,
jobRecordHistory.TaskId, jobRecordHistory);
}
}
}
}
}
public class TestDbContext : DbContext
{
public DbSet<JobRecordHistory> JobRecordHistories { get; set; }
}
public class JobRecordHistory
{
public int Id { get; set; }
public int TaskId { get; set; }
public int DeviceId { get; set; }
public DateTime TimeOfRun { get; set; }
}
}
终于想通了,提高了性能。
我需要一个查询和一个子查询,我需要 MAX 函数而不是 ORDER 因为我不关心结果的顺序,我只关心最大的 (timeOfRun)。
此外,一旦我注意到更大的索引列(我的 PK,自动递增)意味着更新的数据,事情就变得简单了,所以我不需要 MAX(timeOfRun),而是使用 MAX(Index),尽管我很确定会以同样的方式工作。
这是我的 LINQ:
var historyQuery = db.JobsRecordHistories.AsNoTracking().Where(y => y.TaskID == taskID &&
db.JobsRecordHistories.Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).Select(g => g.Max(i => i.Index)).Contains<int>(y.Index));
这是生成的 SQL:
{SELECT
`Extent1`.`Index`,
`Extent1`.`TaskID`,
`Extent1`.`DeviceID`,
`Extent1`.`RunResult`,
`Extent1`.`JobResult`,
`Extent1`.`JobResultValue`,
`Extent1`.`ExtendedResults`,
`Extent1`.`Comments`,
`Extent1`.`ReporterID`,
`Extent1`.`FieldID`,
`Extent1`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Extent1`
WHERE (`Extent1`.`TaskID` = @p__linq__0) AND (EXISTS(SELECT
1 AS `C1`
FROM (SELECT
`Extent2`.`DeviceID` AS `K1`,
MAX(`Extent2`.`Index`) AS `A1`
FROM `JobsRecordHistories` AS `Extent2`
WHERE `Extent2`.`TaskID` = @p__linq__1
GROUP BY
`Extent2`.`DeviceID`) AS `GroupBy1`
WHERE `GroupBy1`.`A1` = `Extent1`.`Index`))}
我希望这会对某些人有所帮助,因为我花了 1.5 天的时间进行谷歌搜索,查看 SQL 查询、LINQ、调试和优化
我真的需要这方面的帮助,但经过数小时的搜索后仍找不到相关答案。
MySQL,Entity Framework6、数据库有几百万条记录,记录看起来像:
Indexint(11) NOT NULL
TaskIDint(11) NOT NULL
DeviceIDbigint(20) NOT NULL
Commentslongtext NULL
ExtendedResultslongtext NULL
RunResultint(11) NOT NULL
JobResultint(11) NOT NULL
JobResultValuedouble NOT NULL
ReporterIDbigint(20) NOT NULL
FieldIDbigint(20) NOT NULL
TimeOfRundatetime NOT NULL
我需要的是获取特定taskID的所有记录,然后按DeviceID分组并按TimeOfRun排序,以便获取特定taskID中每个deviceID的最新数据。
这是我的代码:
List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).
Select(x => x.OrderByDescending(y => y.TimeOfRun).FirstOrDefault()).ToList();
但这是生成的查询:
{SELECT
`Apply1`.`Index`,
`Apply1`.`TaskID`,
`Apply1`.`DEVICEID1` AS `DeviceID`,
`Apply1`.`RunResult`,
`Apply1`.`JobResult`,
`Apply1`.`JobResultValue`,
`Apply1`.`ExtendedResults`,
`Apply1`.`Comments`,
`Apply1`.`ReporterID`,
`Apply1`.`FieldID`,
`Apply1`.`TimeOfRun`
FROM (SELECT
`Project2`.`p__linq__0`,
`Project2`.`DeviceID`,
(SELECT
`Project3`.`Index`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Index`,
(SELECT
`Project3`.`TaskID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TaskID`,
(SELECT
`Project3`.`DeviceID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `DEVICEID1`,
(SELECT
`Project3`.`RunResult`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `RunResult`,
(SELECT
`Project3`.`JobResult`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResult`,
(SELECT
`Project3`.`JobResultValue`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResultValue`,
(SELECT
`Project3`.`ExtendedResults`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ExtendedResults`,
(SELECT
`Project3`.`Comments`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Comments`,
(SELECT
`Project3`.`ReporterID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ReporterID`,
(SELECT
`Project3`.`FieldID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `FieldID`,
(SELECT
`Project3`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TimeOfRun`
FROM (SELECT
@p__linq__0 AS `p__linq__0`,
`Distinct1`.`DeviceID`
FROM (SELECT DISTINCT
`Extent1`.`DeviceID`
FROM `JobsRecordHistories` AS `Extent1`
WHERE `Extent1`.`TaskID` = @p__linq__0) AS `Distinct1`) AS `Project2`) AS `Apply1`}
这花费的时间太长了。
我不知道 SQL 是否足够好,我承认,但是如果在 WHERE 语句之后插入一个 ToList(),那么我会更快地得到结果,尽管这仍然不是正确的做法,因为有很多在这种情况下数据库传递给我的应用程序的不需要的数据,它仍然很慢 = 40k 记录需要 30 秒。
我也试过这个:
Dictionary<long, DateTime> DeviceIDAndTime = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID)
.Select(g => new DeviceIDaAndTime { deviceID = g.Key, timeOfRun = g.Max(gi => gi.TimeOfRun) }).ToDictionary(x => x.deviceID, x => x.timeOfRun);
为了这样使用字典:
List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => DeviceIDAndTime.Keys.Contains(x.DeviceID) && x.TimeOfRun == DeviceIDAndTime[x.DeviceID]).ToList();
但是我得到这个错误:
Additional information: LINQ to Entities does not recognize the method 'System.DateTime get_Item(Int64)' method, and this method cannot be translated into a store expression.
根据我的理解,这是有道理的,当将 timeOfRun 与字典值进行比较时,LINQ 在编写查询时需要一个特定的值而不是一个集合。
奇怪的是我没有找到任何相关的 post 而且其他人也没有遇到这个问题。我想我错过了什么。
感谢任何帮助,谢谢
给出查询语法而不是基于镜头的方法。
我没有在本地对此进行测试,但您可能会看到改进的 sql 代。
或者至少这种方法可能会引导您走上正确的道路
using System;
using System.Data.Entity;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace EF.CodeFirst
{
[TestClass]
public class UnitTest1
{
[TestMethod]
public void TestMethod1()
{
using (var db = new TestDbContext())
{
var taskId = 1;
var query = from job in db.JobRecordHistories
where job.TaskId == taskId
orderby job.TimeOfRun descending
group job by job.DeviceId
into deviceGroup
select deviceGroup;
foreach (var deviceGroup in query)
{
foreach (var jobRecordHistory in deviceGroup)
{
Console.WriteLine("DeviceId '{0}', TaskId'{1}' Runtime'{2}'", jobRecordHistory.DeviceId,
jobRecordHistory.TaskId, jobRecordHistory);
}
}
}
}
}
public class TestDbContext : DbContext
{
public DbSet<JobRecordHistory> JobRecordHistories { get; set; }
}
public class JobRecordHistory
{
public int Id { get; set; }
public int TaskId { get; set; }
public int DeviceId { get; set; }
public DateTime TimeOfRun { get; set; }
}
}
终于想通了,提高了性能。
我需要一个查询和一个子查询,我需要 MAX 函数而不是 ORDER 因为我不关心结果的顺序,我只关心最大的 (timeOfRun)。
此外,一旦我注意到更大的索引列(我的 PK,自动递增)意味着更新的数据,事情就变得简单了,所以我不需要 MAX(timeOfRun),而是使用 MAX(Index),尽管我很确定会以同样的方式工作。
这是我的 LINQ:
var historyQuery = db.JobsRecordHistories.AsNoTracking().Where(y => y.TaskID == taskID &&
db.JobsRecordHistories.Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).Select(g => g.Max(i => i.Index)).Contains<int>(y.Index));
这是生成的 SQL:
{SELECT
`Extent1`.`Index`,
`Extent1`.`TaskID`,
`Extent1`.`DeviceID`,
`Extent1`.`RunResult`,
`Extent1`.`JobResult`,
`Extent1`.`JobResultValue`,
`Extent1`.`ExtendedResults`,
`Extent1`.`Comments`,
`Extent1`.`ReporterID`,
`Extent1`.`FieldID`,
`Extent1`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Extent1`
WHERE (`Extent1`.`TaskID` = @p__linq__0) AND (EXISTS(SELECT
1 AS `C1`
FROM (SELECT
`Extent2`.`DeviceID` AS `K1`,
MAX(`Extent2`.`Index`) AS `A1`
FROM `JobsRecordHistories` AS `Extent2`
WHERE `Extent2`.`TaskID` = @p__linq__1
GROUP BY
`Extent2`.`DeviceID`) AS `GroupBy1`
WHERE `GroupBy1`.`A1` = `Extent1`.`Index`))}
我希望这会对某些人有所帮助,因为我花了 1.5 天的时间进行谷歌搜索,查看 SQL 查询、LINQ、调试和优化