如果缺少定界符,则有条件拆分

Conditional Split if Delimiter is missing

背景:下面的table包含相关文件的示例数据(文件名称为MSDS、RMQ、COA、Technical Datasheet)对于材料 A、B、C 等。这些文件中的信息包括日期、重金属杂质和残留溶剂杂质及其以 ppm(百万分之一)为单位的含量。

我使用 power 查询对这些数据进行了排序,以便生成下图中显示的 2 tables。

这些包含在整个文档中发现的重金属(蓝色)和残留溶剂(黄色)含量最高的,以及包含此值的文档的来源。为了复制 spreadsheet,我在底部提供了(相当广泛的)M 代码。对于这个问题非常简单; “重金属”和“残留溶剂”是用作分隔符的短语,用于相应地拆分数据。

小问题: 尽管对 table 的功能感到满意,但我并不认为 'splitting of a split column'(参见 M 代码)是分离数据的完全令人满意的解决方案。随后我意识到,如果一个单元不小心不包括“重金属”作为分隔符,逻辑将导致该单元的残留溶剂数据丢失(就像单元 4E 的情况一样(Material C , 技术数据 Sheet)).

我可能只是坚持那些使用这个传播的人sheet以确保这些短语始终存在但是我想在这里问问是否有人有任何巧妙的替代提供的 M 代码,这样虽然 Heavy如果没有分隔符,金属可能会丢失(或者如果拼写不正确),残留溶剂仍然会被拉出来。

我很欣赏这对某人来说是一项相当艰巨的工作,幸运的是这是一个相对较小的问题,所以任何建议都只是一种奖励。我也刚刚通过它非常有趣地展示了如何使用 power query 来拆分单元格中看似复杂的数据。另外请注意 table 中的数据是 'messy' 以测试这是否会导致任何问题。

M代码: 这是仅针对残留溶剂 Table 的代码。 Power 查询将数据拆分为重金属和残留溶剂,然后根据 table 删除相应的列。

    let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Material", type text}, {"MSDS", type text}, {"RMQ", type text}, {"COA", type text}, {"Technical Data Sheet", type text}}),
    
    //Clean Up user Input (removed additional Spaces)
    #"Trimmed Text" = Table.TransformColumns(#"Changed Type",{{"Material", Text.Trim, type text}, {"MSDS", Text.Trim, type text}, {"RMQ", Text.Trim, type text}, {"COA", Text.Trim, type text}, {"Technical Data Sheet", Text.Trim, type text}}),
   
   //Split Data into Date, Heavy Metals and Residual Solvents
   //MSDS 
    #"Split MSDS by Heavy Metals" = Table.SplitColumn(#"Trimmed Text", "MSDS", Splitter.SplitTextByDelimiter("Heavy Metals", QuoteStyle.Csv), {"MSDS Date", "MSDS Heavy Metals"}),
    #"Split MSDS by Residual Solvents" = Table.SplitColumn(#"Split MSDS by Heavy Metals", "MSDS Heavy Metals", Splitter.SplitTextByDelimiter("Residual Solvents", QuoteStyle.Csv), {"MSDS Heavy Metals", "MSDS Residual Solvents"}),
    
    // RMQ
    #"Split RMQ by Heavy Metals" = Table.SplitColumn(#"Split MSDS by Residual Solvents", "RMQ", Splitter.SplitTextByDelimiter("Heavy Metals", QuoteStyle.Csv), {"RMQ Date", "RMQ Heavy Metals"}),
    #"Split Column by Residual Solvents" = Table.SplitColumn(#"Split RMQ by Heavy Metals", "RMQ Heavy Metals", Splitter.SplitTextByDelimiter("Residual Solvents", QuoteStyle.Csv), {"RMQ Heavy Metals", "RMQ Residual Solvents"}),
    
    //COA
    #"Split COA by Heavy Metals" = Table.SplitColumn(#"Split Column by Residual Solvents", "COA", Splitter.SplitTextByDelimiter("Heavy Metals", QuoteStyle.Csv), {"COA Date", "COA Heavy Metals"}),
    #"Split COA by Residual Solvents" = Table.SplitColumn(#"Split COA by Heavy Metals", "COA Heavy Metals", Splitter.SplitTextByDelimiter("Residual Solvents", QuoteStyle.Csv), {"COA Heavy Metals", "COA Residual Solvents"}),
    
    //Technical Data Sheet
    #"Split Technical Data Sheet by Heavy Metals" = Table.SplitColumn(#"Split COA by Residual Solvents", "Technical Data Sheet", Splitter.SplitTextByDelimiter("Heavy Metals", QuoteStyle.Csv), {"Technical Data Sheet Date", "Technical Data Sheet Heavy Metals"}),
    #"Split Technical Data Sheet by Residual Solvents" = Table.SplitColumn(#"Split Technical Data Sheet by Heavy Metals", "Technical Data Sheet Heavy Metals", Splitter.SplitTextByDelimiter("Residual Solvents", QuoteStyle.Csv), {"Technical Data Sheet Heavy Metals", "Technical Data Sheet Residual Solvents"}),
    
    //Changes Data to date type
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Technical Data Sheet by Residual Solvents",{{"MSDS Date", type date}, {"RMQ Date", type date}, {"COA Date", type date}, {"Technical Data Sheet Date", type date}}),
   
   //Remove Date (add // to add date back into data)
    #"Removed Date" = Table.RemoveColumns(#"Changed Type1",{"MSDS Date", "RMQ Date", "COA Date", "Technical Data Sheet Date"}),
    
    //Clean up unnecessary line breaks
    #"Trimmed Text1" = Table.TransformColumns(#"Removed Date",{{"Material", Text.Trim, type text}, {"MSDS Heavy Metals", Text.Trim, type text}, {"MSDS Residual Solvents", Text.Trim, type text}, {"RMQ Heavy Metals", Text.Trim, type text}, {"RMQ Residual Solvents", Text.Trim, type text}, {"COA Heavy Metals", Text.Trim, type text}, {"COA Residual Solvents", Text.Trim, type text}, {"Technical Data Sheet Heavy Metals", Text.Trim, type text}, {"Technical Data Sheet Residual Solvents", Text.Trim, type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Trimmed Text1",{"MSDS Heavy Metals", "RMQ Heavy Metals", "COA Heavy Metals", "Technical Data Sheet Heavy Metals"}),
    
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"MSDS Residual Solvents", "MSDS"}, {"RMQ Residual Solvents", "RMQ"}, {"COA Residual Solvents", "COA"}, {"Technical Data Sheet Residual Solvents", "Technical Data Sheet"}}),
    
    //Unpivot data into columns, split and clean up as necessary necessary
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Renamed Columns", {"Material"}, "Source", "Amount"),
    
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Unpivoted Other Columns", {{"Amount", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Amount"),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Amount", type text}}),
    #"Trimmed Text2" = Table.TransformColumns(#"Changed Type2",{{"Amount", Text.Trim, type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Trimmed Text2", "Amount", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, true), {"Heavy Metal", "Amount"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Heavy Metal", type text}, {"Amount", Int64.Type}}),

 // if date is to be included add this line 
        //{"Date", (t) => t[Date]{List.PositionOf(t[Amount],List.Max(t[Amount]))}, type date}


    //Group rows by Material and Metal
//Extract the highest amount and corresponding source
    #"Grouped Rows" = Table.Group(#"Changed Type3", {"Material", "Heavy Metal"}, {
        {"Amount", each List.Max([Amount]),type number},
        {"Source", (t) => t[Source]{List.PositionOf(t[Amount],List.Max(t[Amount]))}, type text}
       
        })
in
    #"Grouped Rows"

Link 到文件:

https://1drv.ms/u/s!AsrLaUgt0KCLhXtP-jYDd4Z0ujKQ?e=Ba8Htx

我会以不同的方式在溶剂和金属之间进行拆分,因此缺少一类或另一类都没有关系。

如果 Residual SolventsHeavy Metals 可能存在拼写错误,您甚至可以进行一些模糊匹配,而不是像我在代码中那样进行相等匹配。

  • 反转 Material 列以外的列以创建三列
  • 通过换行将 Value 列拆分为
  • Trim Value 列,然后过滤掉空白
  • 根据“值”列添加自定义列,仅复制日期或字符串 Heavy MetalsResidual Solvents
  • 向下填充,使每一行都有一个条目
  • 过滤掉日期(通过只选择金属和溶剂条目)。
  • 过滤值和自定义列(请参阅代码中的注释)
  • 拆分物质和数量之间的值列
  • 这将为您留下 table 五列
    • 您可以针对金属或溶剂过滤第五列
  • 然后按Material分组并提取你想要的

M 代码溶剂table

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Material", type text}, {"MSDS", type text}, {"RMQ", type text}, {"COA", type text}, {"Technical Data Sheet", type text}}),
    
    //Unpivot to develop a single column of solvent/metals/date data
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Material"}, "Attribute", "Value"),

    //split into rows by line feed
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Unpivoted Other Columns", 
        {{"Value", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Value"),
    #"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Value", Text.Trim, type text}}),

    //filter out the blank rows
    #"Filtered Rows" = Table.SelectRows(#"Trimmed Text", each ([Value] <> "")),

    //Add custom column for separating the tables
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each try Date.FromText([Value]) otherwise 
        if [Value] = "Heavy Metals" or [Value] = "Residual Solvents" then [Value] else null),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", type text}}),
    #"Filled Down" = Table.FillDown(#"Changed Type1",{"Custom"}),

    //Filter the value and custom columns to remove contaminant type from Value column and remove dates from Custom column
    #"Filtered Rows1" = Table.SelectRows(#"Filled Down", 
        each ([Custom] = "Heavy Metals" or [Custom] = "Residual Solvents") and ([Value] <> "Heavy Metals" and [Value] <> "Residual Solvents")),

    //split substance from amount
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows1", "Value", 
        Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, true), {"Substance", "Amount"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Substance", type text}, {"Amount", Int64.Type}}),

    //Filter for Solvents Table
    #"Filtered Rows2" = Table.SelectRows(#"Changed Type2", each ([Custom] = "Residual Solvents")),

    //Groub by Material and Substance, then extract the Max contaminant and Source
    #"Grouped Rows" = Table.Group(#"Filtered Rows2", {"Material", "Substance"}, {
        {"Amount", each List.Max([Amount]), type number},
        {"Source", (t) => t[Attribute]{List.PositionOf(t[Amount],List.Max(t[Amount]))}, type text}
        })
in
    #"Grouped Rows"

为了学习经验,我决定解决从两个或多个来源引用相同最大数量的问题。

我将 Table.Group 函数的 Source 提取行更改为 return 这些作为 semi-colon 分隔的字符串:

    //Group by Material and Substance
    //Extract Max Amount and Source (or multiple Sources if max amount identical)
    #"Grouped Rows" = Table.Group(#"Filtered Rows2", {"Material", "Substance"}, {
        //{"All", each _, type table [Material=nullable text, Attribute=text, Substance=nullable text, Amount=nullable number, Custom=nullable text]},        
        {"Amount", each List.Max([Amount]), type number},
        //{"Source", (t) => t[Attribute]{List.PositionOf(t[Amount],List.Max(t[Amount]))}, type text},
        {"Source", (t)=> Text.Combine(
            List.Generate(() => [Counter=1, IDX = List.PositionOf(t[Amount],List.Max(t[Amount]))],
                 each [Counter] <= List.Count(List.PositionOf(t[Amount],List.Max(t[Amount]),4)),
                 each [Counter = [Counter]+1, IDX = List.PositionOf(t[Amount],List.Max(t[Amount]),4){[Counter]}],
                 each t[Attribute]{[IDX]}),"; ")}
        })

以这种方式使用 List.Generate 是根据 Chris Webb's BI Blog

中解决的稍微不同的问题改编的