当第二个数据页标题不可用时避免 Excel PowerQuery 错误

Avoid Excel PowerQuery error when second data page headings not available

我使用 PowerQuery 从如下所示的 CSV 文件中导入数据:

Report Title,,,,,
,Date,Type,USER_ID,PICKED_QTY,No of Hours
,31/10/2021,Type A,User_1,300,3
,31/10/2021,Type A,User_3,250,8
,01/11/2021,Type B,User_1,167,5
,01/11/2021,Type C,User_2,988,2
,02/11/2021,Type A,User_1,1113,4
Date,Type,USER_ID,PICKED_QTY,No of Hours,
03/11/2021,Type C,User_1,1500,5,
04/11/2021,Type A,User_1,200,8,    

有时看起来像这样(没有第二页)- 这就是问题所在:

Report Title,,,,,
,Date,Type,USER_ID,PICKED_QTY,No of Hours
,31/10/2021,Type A,User_1,300,3
,31/10/2021,Type A,User_3,250,8
,01/11/2021,Type B,User_1,167,5
,01/11/2021,Type C,User_2,988,2
,02/11/2021,Type A,User_1,1113,4

我使用此 PQ 将数据转换为可读格式(此来源会有所不同,但为简单起见,此处引用 table):

数据源:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    SplitData = Table.SplitColumn(Source,"Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv))
in
    SplitData 


然后我使用两个查询来排列数据,因此日期列都在同一列中,等等。

查询 1:

let
    Source = DataSource,
    RemoveTopRows = Table.Skip(Source,1),
    PromoteHeaders = Table.PromoteHeaders(RemoveTopRows, [PromoteAllScalars=true]),
    FilterRows = Table.SelectRows(PromoteHeaders, each ([#""] = "")),
    RemoveOtherColumns = Table.SelectColumns(FilterRows,{"Date", "Type", "USER_ID", "PICKED_QTY", "No of Hours"}),
    ChangeType = Table.TransformColumnTypes(RemoveOtherColumns,{{"Date", type date}, {"Type", type text}, 
                                                                {"USER_ID", type text}, {"PICKED_QTY", Int64.Type}, 
                                                                {"No of Hours", Int64.Type}})
in
    ChangeType  

查询 2:

let
    Source = DataSource,
    RemoveTopRows = Table.Skip(Source,1),
    FilterRows = Table.SelectRows(RemoveTopRows, each ([Column1.1] <> "")),
    PromoteHeaders = Table.PromoteHeaders(FilterRows, [PromoteAllScalars=true]),
    RemoveOtherColumns = Table.SelectColumns(PromoteHeaders,{"Date", "Type", "USER_ID", "PICKED_QTY", "No of Hours"}),
    FilterRows2 = Table.SelectRows(RemoveOtherColumns, each ([Date] <> "Date")),
    ChangeType = Table.TransformColumnTypes(FilterRows2,{{"Date", type date}, {"Type", type text}, {"USER_ID", type text}, 
                                                         {"PICKED_QTY", Int64.Type}, {"No of Hours", Int64.Type}})
in
    ChangeType  


最后,我将前两个查询连接在一起并分组以获得最终的 table。

查询 3:

let
    Source = Query1,
    AppendQueries = Table.Combine({Source, Query2}),
    SortRows = Table.Sort(AppendQueries,{{"Date", Order.Ascending}}),
    GroupRows = Table.Group(SortRows, {"Date", "Type"}, {{"Picked Qty", each List.Sum([PICKED_QTY]), type nullable number}, 
                                                         {"Total Hours", each List.Sum([No of Hours]), type nullable number}}),
    AddDivision = Table.AddColumn(GroupRows, "Rate", each [Picked Qty] / [Total Hours], type number)
in
    AddDivision


问题

有时我的原始数据不包括第二页数据,所以不需要Query2。
发生这种情况时,如果我不为第二页手动添加 headers,则会出现错误:[Expression Error] The column 'Date' of the table wasn't found.

如何避免这种情况?错误出现在 Query2RemoveOtherColumns - 没有列 headers 它找不到正确的列,并且在 Query3 中因为它无法附加返回一个查询错误。

无需全部重写,只需将 Query2 的最后一行更改为

in try ChangeType otherwise Table.FromRecords({[Date = null, Type = null, USER_ID=null, PICKED_QTY=null, No of Hours = null]})

in try ChangeType otherwise Table.Skip(Table.FromRecords({[Date = null, Type = null, USER_ID=null, PICKED_QTY=null, No of Hours = null]}),1)

正在创建:


或者只在一个查询中完成所有事情

let Source = Csv.Document(File.Contents("C:\temp2\data.csv"),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Removed Top Rows" = Table.Skip(Source,1),
#"Filtered Rows" = Table.PromoteHeaders(Table.SelectRows(#"Removed Top Rows", each [Column1] = ""), [PromoteAllScalars=true]),
#"Filtered Rows2" = Table.PromoteHeaders(Table.SelectRows(#"Removed Top Rows", each [Column1] <> ""), [PromoteAllScalars=true]),
AppendQueries = Table.Combine({#"Filtered Rows",#"Filtered Rows2"}),
SortRows = Table.Sort(AppendQueries,{{"Date", Order.Ascending}}),
#"Changed Type1" = Table.TransformColumnTypes(SortRows,{{"PICKED_QTY", type number}, {"No of Hours", type number}}),
GroupRows = Table.Group(#"Changed Type1", {"Date", "Type"}, {{"Picked Qty", each List.Sum([PICKED_QTY]), type number}, {"Total Hours", each List.Sum([No of Hours]), type number}}),
AddDivision = Table.AddColumn(GroupRows, "Rate", each [Picked Qty] / [Total Hours], type number)
in AddDivision