使用excel或Power BI中的"Seperate by Delimiter"功能后,如何解决列中数据不一致和杂乱无章的问题?
How to solve the problem of inconsistent and disorganised data in a column after using a "Seperate by Delimiter" feature within excel or Power BI?
!!这是我关于 Stack Overflow 的第一个问题,所以我提前为任何含糊不清的陈述道歉!!
问题:由于 input data
中缺少信息,每列中的数据不一致且未组织
我将使用的术语:
- 输入数据: 在 Power BI 中应用“按分隔符分隔”功能之前列中的数据
- 输出数据:在 Power BI 中应用“按分隔符分隔”功能后列中的数据
问题
“我正在 Power BI 中处理数据集以正确构建它”
Power BI中Dataset的输入数据
Power BI中Dataset的输出数据
“正如您在图片中看到的那样,我有一个列,其中多个信息组合在一起 (自动生成系统的结果) 以上述格式(标题:信息|title:info|title:info)。在我的案例中,我使用定界符“|”分隔了这些数据。但由于原始输入数据中缺少一对(title:info),最终在单独的列中出现了无组织的数据“
真正的问题
“每个列现在都有应该属于另一个列的值。发生这种情况是因为输入数据的每个单元格中缺少信息(标题:信息)对。由于多个单元格跳到下一个(标题:信息) ) 对导致一列充满异构 (title:info) 对
例如:-
- 名为“Product Details.13”的列现在有多个对值,例如“Qty Available:12”、“Qty Invoiced:2”、“Qty Invoiced:10”、“Qty Canceled:15”,而不是只有一组同质的“Qty Invoiced:0”
参考代码
M Language
let
#"Split Column by Delimiter" = Table.SplitColumn(#"Reordered Columns1",
"Product Details", Splitter.SplitTextByDelimiter("|",QuoteStyle.Csv),
{"Product Details.1", "Product Details.2", "Product Details.3",
"Product Details.4", "Product Details.5", "Product Details.6",
"Product Details.7", "Product Details.8", "Product Details.9",
"Product Details.10", "Product Details.11", "Product Details.12",
"Product Details.13", "Product Details.14", "Product Details.15",
"Product Details.16", "Product Details.17", "Product Details.18",
"Product Details.19", "Product Details.20"})
in
#"Split Column by Delimiter"
请求解决方案
- 请帮助我推荐一些处理此类异构数据以使其一致和同质的想法
- 帮我将无效的 (title:info) 对移动到它的右列而不影响有效数据,留下空值或空白代替无效数据
预期结果
- 具有相似(标题:信息)对的同类数据的列一致
备注:
- 我之前在excel遇到过类似的问题
- 我知道这个问题与 MS EXCEL 或 POWER BI 的无能
无关
样本:
As requested by @Mr. Ron, I am unable to provide any proper SAMPLE file in csv,xlsx,txt format because Stack Overflow does not allow this but I'll try to explain my issue with a simple reference
期望:每个(标题:信息)数据在整个过程中应该是同步的
列
| Header-1 | Header-2 | Header-3 | Header-4 | Header-5 |
| Name:abc | SKU:1234 | order:a1 | invoice:1a | Shipment:0 |
| Name:eef | SKU:5678 | order:b2 | invoice:2b | Shipment:1 |
| Name:ghi | SKU:1256 | order:c3 | invoice:3c | Shipment:0 |
| Name:jkl | SKU:3478 | order:d4 | invoice:4d | Shipment:1 |
现实:第 3、4 和 5 列的 (title:info) 数据在整个列中不一致
| Header-1 | Header-2 | Header-3 | Header-4 | Header-5 |
| Name:abc | SKU:1234 | order:a1 | Shipment:0 | available:N0 |
| Name:eef | SKU:5678 | order:b2 | invoice:2b | Shipment:1 |
| Name:ghi | SKU:1256 | available:N0 | price:2344 | Discount:0.02% |
| Name:jkl | SKU:3478 | order:d4 | invoice:4d | Shipment:1 |
我希望现在一切都会清楚
不确定您想要什么结果,但是根据您发布的数据创建,
- a table 其中 headers 是数据类型
- 假设每一行代表一个特定的产品
如果这不是您想要的,请说明。
您可以:
- 拆分管道上的数据
- 删除空列
- 添加一列来表示行号
- 逆透视其他列
- 在冒号上拆分值列
- 在值列上选择“不聚合”
M码
阅读评论以更好地理解算法
更改第 2 行中的 table 名称
let
//just preprocessing to get from what you posted
// to pipe-separated table
Source = Excel.CurrentWorkbook(){[Name="Table25"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"| Header-1 | Header-2 | Header-3 | Header-4 | Header-5 |", type text}}),
#"Demoted Headers" = Table.DemoteHeaders(#"Changed Type"),
#"Changed Type1" = Table.TransformColumnTypes(#"Demoted Headers",{{"Column1", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type1", "Column1", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Column1.1", "Column1.7"}),
#"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Column1.2", Text.Trim, type text}, {"Column1.3", Text.Trim, type text}, {"Column1.4", Text.Trim, type text}, {"Column1.5", Text.Trim, type text}, {"Column1.6", Text.Trim, type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Trimmed Text", [PromoteAllScalars=true]),
#"Changed Type3" = Table.TransformColumnTypes(#"Promoted Headers",{{"Header-1", type text}, {"Header-2", type text}, {"Header-3", type text}, {"Header-4", type text}, {"Header-5", type text}}),
//Add index column to retain original row numbers
rowNums = Table.AddIndexColumn(#"Changed Type3","Row Number",0,1,Int64.Type),
//Unpivot except for rowNum column
//remove Attribue column
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(rowNums, {"Row Number"}, "Attribute", "Value"),
#"Removed Columns1" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//split by the colon delimiter
//set data types
#"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns1", "Value", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"Value.1", "Value.2"}),
#"Changed Type4" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Value.1", type text}, {"Value.2", type text}}),
//Pivot on the Value.1 column
//Remove the row number column
#"Pivoted Column" = Table.Pivot(#"Changed Type4", List.Distinct(#"Changed Type4"[Value.1]), "Value.1", "Value.2"),
#"Removed Columns2" = Table.RemoveColumns(#"Pivoted Column",{"Row Number"})
in
#"Removed Columns2"
!!这是我关于 Stack Overflow 的第一个问题,所以我提前为任何含糊不清的陈述道歉!!
问题:由于 input data
我将使用的术语:
- 输入数据: 在 Power BI 中应用“按分隔符分隔”功能之前列中的数据
- 输出数据:在 Power BI 中应用“按分隔符分隔”功能后列中的数据
问题
“我正在 Power BI 中处理数据集以正确构建它”
Power BI中Dataset的输入数据
Power BI中Dataset的输出数据
“正如您在图片中看到的那样,我有一个列,其中多个信息组合在一起 (自动生成系统的结果) 以上述格式(标题:信息|title:info|title:info)。在我的案例中,我使用定界符“|”分隔了这些数据。但由于原始输入数据中缺少一对(title:info),最终在单独的列中出现了无组织的数据“
真正的问题
“每个列现在都有应该属于另一个列的值。发生这种情况是因为输入数据的每个单元格中缺少信息(标题:信息)对。由于多个单元格跳到下一个(标题:信息) ) 对导致一列充满异构 (title:info) 对
例如:-
- 名为“Product Details.13”的列现在有多个对值,例如“Qty Available:12”、“Qty Invoiced:2”、“Qty Invoiced:10”、“Qty Canceled:15”,而不是只有一组同质的“Qty Invoiced:0”
参考代码
M Language
let
#"Split Column by Delimiter" = Table.SplitColumn(#"Reordered Columns1",
"Product Details", Splitter.SplitTextByDelimiter("|",QuoteStyle.Csv),
{"Product Details.1", "Product Details.2", "Product Details.3",
"Product Details.4", "Product Details.5", "Product Details.6",
"Product Details.7", "Product Details.8", "Product Details.9",
"Product Details.10", "Product Details.11", "Product Details.12",
"Product Details.13", "Product Details.14", "Product Details.15",
"Product Details.16", "Product Details.17", "Product Details.18",
"Product Details.19", "Product Details.20"})
in
#"Split Column by Delimiter"
请求解决方案
- 请帮助我推荐一些处理此类异构数据以使其一致和同质的想法
- 帮我将无效的 (title:info) 对移动到它的右列而不影响有效数据,留下空值或空白代替无效数据
预期结果
- 具有相似(标题:信息)对的同类数据的列一致
备注:
- 我之前在excel遇到过类似的问题
- 我知道这个问题与 MS EXCEL 或 POWER BI 的无能 无关
样本:
As requested by @Mr. Ron, I am unable to provide any proper SAMPLE file in csv,xlsx,txt format because Stack Overflow does not allow this but I'll try to explain my issue with a simple reference
期望:每个(标题:信息)数据在整个过程中应该是同步的 列
| Header-1 | Header-2 | Header-3 | Header-4 | Header-5 |
| Name:abc | SKU:1234 | order:a1 | invoice:1a | Shipment:0 |
| Name:eef | SKU:5678 | order:b2 | invoice:2b | Shipment:1 |
| Name:ghi | SKU:1256 | order:c3 | invoice:3c | Shipment:0 |
| Name:jkl | SKU:3478 | order:d4 | invoice:4d | Shipment:1 |
现实:第 3、4 和 5 列的 (title:info) 数据在整个列中不一致
| Header-1 | Header-2 | Header-3 | Header-4 | Header-5 |
| Name:abc | SKU:1234 | order:a1 | Shipment:0 | available:N0 |
| Name:eef | SKU:5678 | order:b2 | invoice:2b | Shipment:1 |
| Name:ghi | SKU:1256 | available:N0 | price:2344 | Discount:0.02% |
| Name:jkl | SKU:3478 | order:d4 | invoice:4d | Shipment:1 |
我希望现在一切都会清楚
不确定您想要什么结果,但是根据您发布的数据创建,
- a table 其中 headers 是数据类型
- 假设每一行代表一个特定的产品
如果这不是您想要的,请说明。
您可以:
- 拆分管道上的数据
- 删除空列
- 添加一列来表示行号
- 逆透视其他列
- 在冒号上拆分值列
- 在值列上选择“不聚合”
M码
阅读评论以更好地理解算法
更改第 2 行中的 table 名称
let
//just preprocessing to get from what you posted
// to pipe-separated table
Source = Excel.CurrentWorkbook(){[Name="Table25"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"| Header-1 | Header-2 | Header-3 | Header-4 | Header-5 |", type text}}),
#"Demoted Headers" = Table.DemoteHeaders(#"Changed Type"),
#"Changed Type1" = Table.TransformColumnTypes(#"Demoted Headers",{{"Column1", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type1", "Column1", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Column1.1", "Column1.7"}),
#"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Column1.2", Text.Trim, type text}, {"Column1.3", Text.Trim, type text}, {"Column1.4", Text.Trim, type text}, {"Column1.5", Text.Trim, type text}, {"Column1.6", Text.Trim, type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Trimmed Text", [PromoteAllScalars=true]),
#"Changed Type3" = Table.TransformColumnTypes(#"Promoted Headers",{{"Header-1", type text}, {"Header-2", type text}, {"Header-3", type text}, {"Header-4", type text}, {"Header-5", type text}}),
//Add index column to retain original row numbers
rowNums = Table.AddIndexColumn(#"Changed Type3","Row Number",0,1,Int64.Type),
//Unpivot except for rowNum column
//remove Attribue column
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(rowNums, {"Row Number"}, "Attribute", "Value"),
#"Removed Columns1" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//split by the colon delimiter
//set data types
#"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns1", "Value", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"Value.1", "Value.2"}),
#"Changed Type4" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Value.1", type text}, {"Value.2", type text}}),
//Pivot on the Value.1 column
//Remove the row number column
#"Pivoted Column" = Table.Pivot(#"Changed Type4", List.Distinct(#"Changed Type4"[Value.1]), "Value.1", "Value.2"),
#"Removed Columns2" = Table.RemoveColumns(#"Pivoted Column",{"Row Number"})
in
#"Removed Columns2"