Select 具有每个类别 Power BI 最大值的行

Select row with MAX value per category Power BI

如何在 Power BI 的 M 中 select 具有每个类别最大值的行。假设我们有 table:

+----------+-------+------------+
| Category | Value |    Date    |
+----------+-------+------------+
| apples   |     1 | 2018-07-01 |
| apples   |     2 | 2018-07-02 |
| apples   |     3 | 2018-07-03 |
| bananas  |     7 | 2018-07-04 |
| bananas  |     8 | 2018-07-05 |
| bananas  |     9 | 2018-07-06 |
+----------+-------+------------+

期望的结果是:

+----------+-------+------------+
| Category | Value |    Date    |
+----------+-------+------------+
| apples   |     3 | 2018-07-03 |
| bananas  |     9 | 2018-07-06 |
+----------+-------+------------+

这是 PBI 的开始 table:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Category", type text}, {"Value", Int64.Type}, {"Date", type date}})
in
    #"Changed Type"

我想知道是否有一种方法可以通过添加一些魔法列 IsMax:

仅在一个 table 内在后续步骤中达到预期的结果
+----------+-------+------------+-------+
| Category | Value |    Date    | IsMax |
+----------+-------+------------+-------+
| apples   |     1 | 2018-07-01 |     0 |
| apples   |     2 | 2018-07-02 |     0 |
| apples   |     3 | 2018-07-03 |     1 |
| bananas  |     7 | 2018-07-04 |     0 |
| bananas  |     8 | 2018-07-05 |     0 |
| bananas  |     9 | 2018-07-06 |     1 |
+----------+-------+------------+-------+

在 Power Query 编辑器中进行基本分组(按 Category 分组并取最大值 Value)得到这个 table:

+----------+-------+
| Category | Value |
+----------+-------+
| apples   |     3 |
| bananas  |     9 |
+----------+-------+

向此 table 添加一个自定义列 IsMax,它只是值 1,然后将其与原始 table 匹配合并(左外连接) CategoryValue。最后,展开 IsMax 列以获得所需的 table,但使用 null 而不是 0 除外。如果您愿意,可以替换 null 值。

这是所有这些步骤的 M 代码:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Value", Int64.Type}, {"Date", type date}, {"Category", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Category"}, {{"Value", each List.Max([Value]), Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "IsMax", each 1, Int64.Type),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Category", "Value"},#"Added Custom",{"Category", "Value"},"Added Custom",JoinKind.LeftOuter),
    #"Expanded Added Custom" = Table.ExpandTableColumn(#"Merged Queries", "Added Custom", {"IsMax"}, {"IsMax"})
in
    #"Expanded Added Custom"

编辑: 一个稍微简化的版本来重现“期望的结果”而不是 IsMax 版本:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Value", Int64.Type}, {"Date", type date}, {"Category", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Category"}, {{"Value", each List.Max([Value]), Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Grouped Rows", {"Category", "Value"}, #"Changed Type", {"Category", "Value"}, "Grouped Rows", JoinKind.LeftOuter),
    #"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries", "Grouped Rows", {"Date"}, {"Date"})
in
    #"Expanded Grouped Rows"

编辑 2: @user11632362 向我指出 another solution 甚至更少的步骤。

一切都发生在分组步骤中。

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Value", Int64.Type}, {"Date", type date}, {"Category", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Category"}, {{"Value", each List.Max([Value]), Int64.Type}, {"Date", each Table.Max(_, "Value")[Date], type date}})
in
    #"Grouped Rows"

这里的关键是each Table.Max(_, "Value")[Date]。这将 subtable 按 Value 和 returns 排序结果的第一行作为记录(并且 [Date] 后缀 returns 中的值Date 该记录的字段)。

请注意,这只会拉过一列,Date。如果您需要拉入一堆列,return 完整记录并在另一个步骤中展开所有需要的字段可能更有意义,而不是向分组步骤添加更多列。

例如,

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6hAolKsDpIaI2Q1RlCBJFQ1xshqjKECyWA1SYl5QAhSZI6syATIAeEUNEUWyIpMgRwQTkVTZImsyAzIAeE0pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t, Col1 = _t, Col2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Category", type text}, {"Value", Int64.Type}, {"Date", type date}, {"Col1", Int64.Type}, {"Col2", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Category"}, {{"Value", each List.Max([Value]), type nullable date}, {"TopValueRow", each Table.Max(_, "Value"), type record}}),
    #"Expanded TopValueRow" = Table.ExpandRecordColumn(#"Grouped Rows", "TopValueRow", {"Date", "Col1", "Col2"}, {"Date", "Col1", "Col2"})
in
    #"Expanded TopValueRow"

我最终通过 index 每个类别获得 MAX。此处描述的想法:

方法 #1 是 R 转换中的单行代码:

library(dplyr)
output <- dataset %>% group_by(Category) %>% mutate(row_no_by_category = row_number(desc(Date)))

方法 #2,完全在 PBI 中完成:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Grouped rows" = Table.Group(Source, {"Category"}, {{"NiceTable", each Table.AddIndexColumn(Table.Sort(_,{{"Date", Order.Descending}} ), "Index",1,1), type table}} ),
    #"Expanded NiceTable" = Table.ExpandTableColumn(#"Grouped rows", "NiceTable", {"Value", "Date", "Index"}, {"Value", "Date", "Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded NiceTable", each ([Index] = 1))
in
    #"Filtered Rows"

另一种方法是使用删除重复项功能,但这需要首先对数据进行正确排序,以便出现的第一行是 select 的正确行。

出于技术原因(参见 Whosebug 帖子 , 2, , and this article),我们需要将 table 缓冲到内存中以确保排序“稳定”。

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSiwoyEktVtJRMgRiIwNDC10Dc10DQ6VYHSQ5I2Q5I1Q5Y2Q5Y7BcUmIeEIIkzZElTdAkLZAlTdEkLZElzZRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Value = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Value", Int64.Type}, {"Date", type date}, {"Category", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Category", Order.Ascending}, {"Value", Order.Descending}}),
    #"Removed Duplicates" = Table.Distinct(Table.Buffer(#"Sorted Rows"), {"Category"})
in
    #"Removed Duplicates"

这一切都可以在 GUI 中完成,除了在最后一步中在 Table.Buffer 包装器中进行编辑。