如何使用 Power Query 删除文本字符串中的重复项
How to remove duplicates in a text string with Power Query
正如我的主题所提到的,在执行了几个步骤(groupby、filter、合并文本...)之后,我遇到了在 power 查询中删除同一单元格中的重复项的问题。
示例:“cc_emails”列有很多行,但由于 Text.Combine 之前的步骤:
,每行都有一些重复的电子邮件
sth like that: "Giang.Phan@abc.com, thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com, Giang.Phan@abc.com, thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com"
我想 1 封邮件在列表中只出现一次?有人可以帮忙看看这个吗?
预期输出:
"Giang.Phan@abc.com,thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com"
##更新我的查询编辑器:
let
Source = Exchange.Contents("giang.phan@abc.com"),
Mail1 = Source{[Name="Mail"]}[Data],
#"Reordered Columns" = Table.ReorderColumns(Mail1,{"DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "Sender", "DisplayTo", "DisplayCc", "ToRecipients", "CcRecipients", "BccRecipients", "Importance", "Categories", "IsRead", "HasAttachments", "Attachments", "Preview", "Attributes", "Body", "Id"}),
#"Filtered Rows" = Table.SelectRows(#"Reordered Columns", each [DateTimeReceived] > #datetime(2021, 12, 29, 0, 0, 0) and [DateTimeReceived] < #datetime(2022, 1, 4, 0, 0, 0)),
#"Expanded ToRecipients" = Table.ExpandTableColumn(#"Filtered Rows", "ToRecipients", {"Address"}, {"ToRecipients.Address"}),
#"Expanded CcRecipients" = Table.ExpandTableColumn(#"Expanded ToRecipients", "CcRecipients", {"Address"}, {"CcRecipients.Address"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded CcRecipients",{"BccRecipients", "Importance", "Categories", "IsRead", "HasAttachments", "Attachments", "Preview", "Attributes"}),
#"Reordered Columns1" = Table.ReorderColumns(#"Removed Columns",{"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "Sender", "DisplayTo", "DisplayCc", "ToRecipients.Address", "CcRecipients.Address", "Body"}),
#"Grouped Rows" = Table.Group(#"Reordered Columns1", {"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "DisplayTo", "DisplayCc"}, {{"cc_address", each Text.Combine([CcRecipients.Address],", "), type text}, {"to_address", each Text.Combine([ToRecipients.Address],", "), type text}}),
#"Grouped Rows1" = Table.Group(#"Grouped Rows", {"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "DisplayTo", "DisplayCc", "cc_address", "to_address"}, {{"Last_time receive", each List.Max([DateTimeReceived]), type datetime}, {"Last_subject", each List.Max([Subject]), type nullable text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Grouped Rows1",{"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "DisplayTo", "DisplayCc", "cc_address", "to_address", "Last_time receive", "Last_subject"})
in
#"Removed Other Columns"
您可以按分隔符拆分文本,select 不同的列表值,然后重新组合为字符串:
let
Source = "Giang.Phan@abc.com, thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com, Giang.Phan@abc.com, thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com",
#"Distinct Values" = Text.Combine(List.Distinct(Text.Split(Source, ", ")),", ")
in
#"Distinct Values"
问题更新后编辑:
在你的情况下,你可以简单地改变这一行:
#"Grouped Rows" = Table.Group(#"Reordered Columns1", {"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "DisplayTo", "DisplayCc"}, {{"cc_address", each Text.Combine([CcRecipients.Address],", "), type text}, {"to_address", each Text.Combine([ToRecipients.Address],", "), type text}}),
包含 List.Distinct
函数,return 仅包含不同的地址值:
#"Grouped Rows" = Table.Group(#"Reordered Columns1", {"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "DisplayTo", "DisplayCc"}, {{"cc_address", each Text.Combine(List.Distinct([CcRecipients.Address]),", "), type text}, {"to_address", each Text.Combine(List.Distinct([ToRecipients.Address]),", "), type text}}),
正如我的主题所提到的,在执行了几个步骤(groupby、filter、合并文本...)之后,我遇到了在 power 查询中删除同一单元格中的重复项的问题。 示例:“cc_emails”列有很多行,但由于 Text.Combine 之前的步骤:
,每行都有一些重复的电子邮件sth like that: "Giang.Phan@abc.com, thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com, Giang.Phan@abc.com, thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com"
我想 1 封邮件在列表中只出现一次?有人可以帮忙看看这个吗? 预期输出:
"Giang.Phan@abc.com,thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com"
##更新我的查询编辑器:
let
Source = Exchange.Contents("giang.phan@abc.com"),
Mail1 = Source{[Name="Mail"]}[Data],
#"Reordered Columns" = Table.ReorderColumns(Mail1,{"DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "Sender", "DisplayTo", "DisplayCc", "ToRecipients", "CcRecipients", "BccRecipients", "Importance", "Categories", "IsRead", "HasAttachments", "Attachments", "Preview", "Attributes", "Body", "Id"}),
#"Filtered Rows" = Table.SelectRows(#"Reordered Columns", each [DateTimeReceived] > #datetime(2021, 12, 29, 0, 0, 0) and [DateTimeReceived] < #datetime(2022, 1, 4, 0, 0, 0)),
#"Expanded ToRecipients" = Table.ExpandTableColumn(#"Filtered Rows", "ToRecipients", {"Address"}, {"ToRecipients.Address"}),
#"Expanded CcRecipients" = Table.ExpandTableColumn(#"Expanded ToRecipients", "CcRecipients", {"Address"}, {"CcRecipients.Address"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded CcRecipients",{"BccRecipients", "Importance", "Categories", "IsRead", "HasAttachments", "Attachments", "Preview", "Attributes"}),
#"Reordered Columns1" = Table.ReorderColumns(#"Removed Columns",{"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "Sender", "DisplayTo", "DisplayCc", "ToRecipients.Address", "CcRecipients.Address", "Body"}),
#"Grouped Rows" = Table.Group(#"Reordered Columns1", {"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "DisplayTo", "DisplayCc"}, {{"cc_address", each Text.Combine([CcRecipients.Address],", "), type text}, {"to_address", each Text.Combine([ToRecipients.Address],", "), type text}}),
#"Grouped Rows1" = Table.Group(#"Grouped Rows", {"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "DisplayTo", "DisplayCc", "cc_address", "to_address"}, {{"Last_time receive", each List.Max([DateTimeReceived]), type datetime}, {"Last_subject", each List.Max([Subject]), type nullable text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Grouped Rows1",{"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "DisplayTo", "DisplayCc", "cc_address", "to_address", "Last_time receive", "Last_subject"})
in
#"Removed Other Columns"
您可以按分隔符拆分文本,select 不同的列表值,然后重新组合为字符串:
let
Source = "Giang.Phan@abc.com, thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com, Giang.Phan@abc.com, thao.tran@abc.com, Khoa.Vu@abc.com, Vn.Offset@abc.com",
#"Distinct Values" = Text.Combine(List.Distinct(Text.Split(Source, ", ")),", ")
in
#"Distinct Values"
问题更新后编辑:
在你的情况下,你可以简单地改变这一行:
#"Grouped Rows" = Table.Group(#"Reordered Columns1", {"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "DisplayTo", "DisplayCc"}, {{"cc_address", each Text.Combine([CcRecipients.Address],", "), type text}, {"to_address", each Text.Combine([ToRecipients.Address],", "), type text}}),
包含 List.Distinct
函数,return 仅包含不同的地址值:
#"Grouped Rows" = Table.Group(#"Reordered Columns1", {"Id", "DateTimeSent", "DateTimeReceived", "Folder Path", "Subject", "DisplayTo", "DisplayCc"}, {{"cc_address", each Text.Combine(List.Distinct([CcRecipients.Address]),", "), type text}, {"to_address", each Text.Combine(List.Distinct([ToRecipients.Address]),", "), type text}}),