EF 核心查询按日期分组和子查询

EF core query group by date and subquery

我有以下 table:

CREATE TABLE "OrderStatusLogs" (
    "Id" UNIQUEIDENTIFIER NOT NULL,
    "OrderId" UNIQUEIDENTIFIER NOT NULL,
    "Status" INT NOT NULL,
    "StartDateTime" DATETIMEOFFSET NOT NULL,
    "EndDateTime" DATETIMEOFFSET NULL DEFAULT NULL,
    PRIMARY KEY ("Id"),
    FOREIGN KEY INDEX "FK_OrderStatusLogs_Orders_OrderId" ("OrderId"),
    CONSTRAINT "FK_OrderStatusLogs_Orders_OrderId" FOREIGN KEY ("OrderId") REFERENCES "Orders" ("Id") ON UPDATE NO_ACTION ON DELETE CASCADE
)
;

对于以下实体:

    [DebuggerDisplay(nameof(OrderStatusLog) + " {Status} {StartDateTime} - {EndDateTime}" )]
    public class OrderStatusLog
    {
        public Guid Id { get; set; }

        public Guid OrderId { get; set; }

        public OrderStatus Status { get; set; }

        public DateTimeOffset StartDateTime { get; set; }

        public DateTimeOffset? EndDateTime { get; set; }
    }

    public enum OrderStatus
    {
        Unknown = 0,
        Pending = 1,
        Processing = 2,
        Shipping = 3,
    }

我正在尝试生成一份报告,其中应显示在给定范围内将多少订单设置为特定状态。

例如,对于 10 月份,我们的范围是 1 到 31 oktober。 所需的输出将是这样的:

1/10/2021 Pending 21 orders
1/10/2021 Processing 23 orders
1/10/2021 Shipping 33 orders
1/10/2021 Unknown 0 orders
...
31/10/2021 Pending 1 orders
31/10/2021 Processing 3 orders
31/10/2021 Shipping 44 orders
31/10/2021 Unknown 5 orders

我在 EF 中编写可以提供正确输出的查询时遇到了一些困难。我可以让事情正常进行,但只能在客户端进行。我正在尝试在数据库中进行这项工作。

到目前为止我试过:

            var logsByDayAndOrderId = orderStatusLogs.GroupBy(c => new { c.StartDateTime.Date, c.OrderId }, (key, values) => new
            {
                key.Date,
                key.OrderId,
                MaxStartDateTime = values.Max(x => x.StartDateTime)
            });

            var list = logsByDayAndOrderId.ToList();

            var statusByDayAndOrderId = logsByDayAndOrderId.Select(c => new
            {
                c.Date,
                c.OrderId,
                orderStatusLogs.FirstOrDefault(x => x.StartDateTime == c.MaxStartDateTime && x.OrderId == c.OrderId).Status
            });

            //var statusByDayAndOrderId = logsByDayAndOrderId.Join(orderStatusLogs.def, inner => new { inner.OrderId, StartDateTime = inner.MaxStartDateTime }, outer => new { outer.OrderId, outer.StartDateTime }, (inner,outer) => new
            //{
            //    inner.Date,
            //    inner.OrderId,
            //    outer.Status
            //}); // TODO rem this query gives more results because of the join. we need an Outer join - but i could not get that to work. the version with select above works better, but then it does not use join so it may be slow(er).

            var list1 = statusByDayAndOrderId.ToList();


            var groupBy = statusByDayAndOrderId
                .GroupBy(c => new { c.Date, c.Status })
                .Select(c => new {  c.Key.Date, c.Key.Status, Count = c.Count() });

            var list2 = groupBy.ToList();

另一次尝试:

            var datesAndOrders = orderStatusLogs
                .GroupBy(c => new { c.StartDateTime.Date, c.OrderId }, (key, values) => key);

            var ordersByDateAndActiveStatusLog = orderStatusLogs
                .Select(c => new
                {
                    c.StartDateTime.Date,
                    c.OrderId,
                    ActiveStatusForDate = orderStatusLogs
                    .OrderByDescending(x => x.StartDateTime)
                    .FirstOrDefault(x => x.OrderId == c.OrderId && x.StartDateTime.Date == c.StartDateTime.Date)
                    .Status
                });

            var list = ordersByDateAndActiveStatusLog.ToList();

            var orderCountByDateAndStatus = ordersByDateAndActiveStatusLog
                .GroupBy(c => new { c.Date, c.ActiveStatusForDate }, (key, values) => new
                {
                    key, count = values.Count()
                });

            var list1 = orderCountByDateAndStatus.ToList();

由于Cannot use an aggregate or a subquery in an expression used for the group by list of a GROUP BY clause.,这两个都失败了。 这是有道理的。

我希望有人可以帮助编写使用 ef core 生成正确数据的 Linq 查询。

备注:

我建议使用 EF Core 扩展 linq2db.EntityFrameworkCore,它能够在数据库查询中处理本地(内存中)集合。免责声明:我是创作者之一。

首先定义生成天序列的函数:

public static IEnumerable<DateTime> GenerateDays(int year, int month)
{
  var start = new DateTime(year, month, 1);
  var endDate = start.AddMonths(1);
  while (start < endDate)
  {
    yield return start;
    start = start.AddDays(1);
  }
}

然后我们可以在 LINQ 查询中使用生成的序列:

var days = GenerateDays(2021, 10).ToArray();

using var dc = ctx.CreateLinqToDbConnection();

var totalsQuery =
  from d in days.AsQueryable(dc)
  from l in orderStatusLogs.Where(l =>
      (l.EndDateTime == null || l.EndDateTime >= d) && l.StartDateTime < d.AddDays(1))
    .DefaultIfEmpty()
  group l by new { Date = d, l.Status } into g
  into g
  select new
  {
     g.Key.Date,
     g.Key.Status,
     Count = g.Sum(x => x == null ? 0 : 1),
  };

var result = totalsQuery.ToList();

应生成以下 SQL:

SELECT
    [d].[item],
    [e].[Status],
    Sum(IIF([e].[OrderID] IS NULL, 0, 1))
FROM
(VALUES
    ('2021-05-01T00:00:00'), ('2021-05-02T00:00:00'),
    ('2021-05-03T00:00:00'), ('2021-05-04T00:00:00'),
    ('2021-05-05T00:00:00'), ('2021-05-06T00:00:00'),
    ('2021-05-07T00:00:00'), ('2021-05-08T00:00:00'),
    ('2021-05-09T00:00:00'), ('2021-05-10T00:00:00'),
    ('2021-05-11T00:00:00'), ('2021-05-12T00:00:00'),
    ('2021-05-13T00:00:00'), ('2021-05-14T00:00:00'),
    ('2021-05-15T00:00:00'), ('2021-05-16T00:00:00'),
    ('2021-05-17T00:00:00'), ('2021-05-18T00:00:00'),
    ('2021-05-19T00:00:00'), ('2021-05-20T00:00:00'),
    ('2021-05-21T00:00:00'), ('2021-05-22T00:00:00'),
    ('2021-05-23T00:00:00'), ('2021-05-24T00:00:00'),
    ('2021-05-25T00:00:00'), ('2021-05-26T00:00:00'),
    ('2021-05-27T00:00:00'), ('2021-05-28T00:00:00'),
    ('2021-05-29T00:00:00'), ('2021-05-30T00:00:00'),
    ('2021-05-31T00:00:00')
) [d]([item])
    LEFT JOIN [OrderStatusLogs] [e] ON ([e].[EndDateTime] IS NULL OR [e].[EndDateTime] >= [d].[item]) AND [e].[StartDateTime] < DateAdd(day, 1, [d].[item])
GROUP BY
    [d].[item],
    [e].[Status]