摆脱 excel table 中仍然具有唯一值的重复项的最有效方法？

Question

我有一个 table 从单独的数据库中提取的超过 111000 行。第一列有一个产品代码，第二列有另一个识别代码，但第三列有一个唯一的 URL。问题是大多数产品都有多个 URL。这会创建一个 table，其中包含重复代码，但具有唯一 URL。为了更好地说明它：

SKU	EAN	URL
`ZA123`	`004998`	https://example.com/A1_Afb_01
`ZA123`	`004998`	https://example.com/A1_Afb_02
`ZA123`	`004998`	https://example.com/A1_Afb_03
`FA156`	`#N/A`	https://example.com/A9_Afb_01
`GD222`	`016847`	https://example.com/Z1_Afb_01
`GD222`	`016847`	https://example.com/Z1_Afb_02

我想要做的是将属于代码的每个 URL 放在它后面的一列中。所以我会创建 URL 1、URL 2 等（最多 URL 20，因为这是产品拥有的最大图片数量）。

手动执行此操作会花费太长时间，我尝试使用公式执行此操作也以失败告终，因为 Excel 不断崩溃（可能是因为它必须进行太多计算一个接一个）。

那么有谁知道更有效的方法吗？一种不会崩溃的方法 Excel?

Answer 1

你应该首先编写一个脚本（使用 python 或 PowerShell），因为脚本执行的方式比 excel 公式更可控，后者没有针对这些东西进行优化，甚至可能表现最差。

Answer 2

您可以使用 Windows Excel 2010+ 和 Office 365

中提供的 Power Query 来执行此操作

使用 Power Query

Select 您数据中的某个单元格 Table
Data => Get&Transform => from Table/Range
当 PQ 编辑器打开时：Home => Advanced Editor
记下第 2 行中的 Table 名称
粘贴下面的 M 代码代替您看到的内容
将第 2 行中的 Table 名称更改回最初生成的名称。
阅读评论并探索 Applied Steps 以了解算法

M码

let

//ensure table name in next line matches real table name in workbook
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

//set all data types to Text
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"SKU", type text}, {"EAN", type text}, {"URL", type text}}),

//Replace error cells
    #"Replaced Errors" = Table.ReplaceErrorValues(#"Changed Type", {{"EAN", null}}),

//Group by SKU/EAN and create List of URLs
    #"Grouped Rows" = Table.Group(#"Replaced Errors", {"SKU", "EAN"}, {
        {"URL", each [URL], type list}        
        }),

//How many columns for the URL split
    numCols = List.Max(List.Transform(#"Grouped Rows"[URL], each List.Count(_))),

//convert List into delimited string
//then expand into columns
//Note that we use # for the delimiter since it is invalid within a URL
expand = Table.SplitColumn(
            Table.TransformColumns(#"Grouped Rows", {"URL", each Text.Combine(_,"#")}),
            "URL",
            Splitter.SplitTextByDelimiter("#"), numCols)
in 
    expand

原创

结果

摆脱 excel table 中仍然具有唯一值的重复项的最有效方法？

Most efficient way to get rid of duplicates in excel table that still have unique values behind them?

excel

duplicates