用 Excel 中的正则表达式替换
Replace by regular expresion in Excel
我在 Excel 中有一个列表,如下所示:
1 / 6 / 45
123
1546
123 456
1247 /% 456 /
我想创建一个新列,将所有连续的非数字序列替换为一个字符。在 Google 表格中,使用 =REGEXREPLACE(A1&"/","\D+",",")
很容易,结果是:
1,6,45,
123,
1546,
123,456
1247,456,
在该公式中,需要 A1&"/"
才能使 REGEXREPLACE
处理数字。没什么大不了的,只是在最后加一个逗号。
我们如何在 Excel 中做到这一点?非常鼓励使用 Pure Power Query(不是 R,不是 Python,只是 M)。 VBA 和其他可点击的 Excel 功能是不可接受的(如查找和替换)。
如果你有 Excel 365:
在B1
中:
=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),SUBSTITUTE(TRIM(CONCAT(IF(ISNUMBER(--X),X," ")))," ",","))
或者,如果连续的数字始终至少由 space:
分隔
=TEXTJOIN(",",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[.*0=0]"))
另一个选项,如果您有权访问它,是 LAMBDA()
。创建一个函数来替换所有类型的字符,类似于 。没有 LAMBDA()
和 TEXTJOIN()
我认为你最好的选择是开始嵌套 SUBSTITUTE()
函数。
如果您有 TEXTJOIN
功能可用,这是另一种变体。
=SUBSTITUTE(TRIM(TEXTJOIN("",TRUE,IFERROR(MID(A2,ROW($A:INDEX(A:A,LEN(A2))),1)+0," ")))," ",",")
这是一个 Power Query 解决方案。
它利用 List.Accumulate
函数来确定是否向字符串添加数字或逗号:
请注意,代码复制了您显示的结果。如果您希望避免尾随(and/or 前导)逗号,可以轻松修改它。
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "textToList", each List.Combine({Text.ToList([Column1]),{","}})),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "commaTerminators", each List.Accumulate(
[textToList],"", (state,current) =>
if List.Contains({"0".."9"},current)
then state & current
else if Text.EndsWith(state,",")
then state
else state & ",")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"textToList"})
in
#"Removed Columns"
Edit 为了消除 leading/trailing 逗号,我们添加了 Text.Trim
函数,该函数在 Power Query 中允许定义特定文本从 start/end 到 Trim:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "textToList", each List.Combine({Text.ToList([Column1]),{","}})),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "commaTerminators", each
Text.Trim(
List.Accumulate(
[textToList],"", (state,current) =>
if List.Contains({"0".."9"},current)
then state & current
else if Text.EndsWith(state,",")
then state
else state & ","),
",")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"textToList"})
in
#"Removed Columns"
VBA UDF 你提到你不想要 VBA,但不清楚你是否将其限制为“可点击”。这是一个用户定义的函数,您可以直接在工作表上使用它。它使用 VBA 正则表达式引擎,可以轻松提取多个匹配项
您可以在工作表上输入一个公式,例如 =commaSep(cell_ref)
以获得与上面我的第二个 PQ 示例中所示相同的结果
Option Explicit
Function commaSep(S As String) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\d+"
If .test(S) Then
Set MC = .Execute(S)
sTemp = ""
For Each M In MC
sTemp = sTemp & "," & M
Next M
commaSep = Mid(sTemp, 2)
Else
commaSep = "no digits"
End If
End With
还有 Power Query 中的另一个选项。
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTQVzADYhNTpVgdINfIGEKbmpjBBIByZgpQjom5gr4qWEBfKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
x1 = Table.AddColumn(#"Changed Type", "x1", each Text.ToList([Column1])),
x2 = Table.AddColumn(x1, "x2", each List.Transform([x1], each if Text.Contains("0123456789", _) then _ else " " )),
x3 = Table.AddColumn(x2, "x3", each Text.Split(Text.Combine([x2])," ")),
x4 = Table.AddColumn(x3, "x4", each List.Transform([x3], each if Text.Contains("0123456789", try Text.At(_,0) otherwise " ") then _&"," else "" )),
x5 = Table.AddColumn(x4, "x5", each Text.Combine([x4])),
#"Removed Columns" = Table.RemoveColumns(x5,{"x1", "x2", "x3", "x4"})
in
#"Removed Columns"
我在 Excel 中有一个列表,如下所示:
1 / 6 / 45
123
1546
123 456
1247 /% 456 /
我想创建一个新列,将所有连续的非数字序列替换为一个字符。在 Google 表格中,使用 =REGEXREPLACE(A1&"/","\D+",",")
很容易,结果是:
1,6,45,
123,
1546,
123,456
1247,456,
在该公式中,需要 A1&"/"
才能使 REGEXREPLACE
处理数字。没什么大不了的,只是在最后加一个逗号。
我们如何在 Excel 中做到这一点?非常鼓励使用 Pure Power Query(不是 R,不是 Python,只是 M)。 VBA 和其他可点击的 Excel 功能是不可接受的(如查找和替换)。
如果你有 Excel 365:
在B1
中:
=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),SUBSTITUTE(TRIM(CONCAT(IF(ISNUMBER(--X),X," ")))," ",","))
或者,如果连续的数字始终至少由 space:
分隔=TEXTJOIN(",",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[.*0=0]"))
另一个选项,如果您有权访问它,是 LAMBDA()
。创建一个函数来替换所有类型的字符,类似于 LAMBDA()
和 TEXTJOIN()
我认为你最好的选择是开始嵌套 SUBSTITUTE()
函数。
如果您有 TEXTJOIN
功能可用,这是另一种变体。
=SUBSTITUTE(TRIM(TEXTJOIN("",TRUE,IFERROR(MID(A2,ROW($A:INDEX(A:A,LEN(A2))),1)+0," ")))," ",",")
这是一个 Power Query 解决方案。
它利用 List.Accumulate
函数来确定是否向字符串添加数字或逗号:
请注意,代码复制了您显示的结果。如果您希望避免尾随(and/or 前导)逗号,可以轻松修改它。
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "textToList", each List.Combine({Text.ToList([Column1]),{","}})),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "commaTerminators", each List.Accumulate(
[textToList],"", (state,current) =>
if List.Contains({"0".."9"},current)
then state & current
else if Text.EndsWith(state,",")
then state
else state & ",")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"textToList"})
in
#"Removed Columns"
Edit 为了消除 leading/trailing 逗号,我们添加了 Text.Trim
函数,该函数在 Power Query 中允许定义特定文本从 start/end 到 Trim:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "textToList", each List.Combine({Text.ToList([Column1]),{","}})),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "commaTerminators", each
Text.Trim(
List.Accumulate(
[textToList],"", (state,current) =>
if List.Contains({"0".."9"},current)
then state & current
else if Text.EndsWith(state,",")
then state
else state & ","),
",")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"textToList"})
in
#"Removed Columns"
VBA UDF 你提到你不想要 VBA,但不清楚你是否将其限制为“可点击”。这是一个用户定义的函数,您可以直接在工作表上使用它。它使用 VBA 正则表达式引擎,可以轻松提取多个匹配项
您可以在工作表上输入一个公式,例如 =commaSep(cell_ref)
以获得与上面我的第二个 PQ 示例中所示相同的结果
Option Explicit
Function commaSep(S As String) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\d+"
If .test(S) Then
Set MC = .Execute(S)
sTemp = ""
For Each M In MC
sTemp = sTemp & "," & M
Next M
commaSep = Mid(sTemp, 2)
Else
commaSep = "no digits"
End If
End With
还有 Power Query 中的另一个选项。
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTQVzADYhNTpVgdINfIGEKbmpjBBIByZgpQjom5gr4qWEBfKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
x1 = Table.AddColumn(#"Changed Type", "x1", each Text.ToList([Column1])),
x2 = Table.AddColumn(x1, "x2", each List.Transform([x1], each if Text.Contains("0123456789", _) then _ else " " )),
x3 = Table.AddColumn(x2, "x3", each Text.Split(Text.Combine([x2])," ")),
x4 = Table.AddColumn(x3, "x4", each List.Transform([x3], each if Text.Contains("0123456789", try Text.At(_,0) otherwise " ") then _&"," else "" )),
x5 = Table.AddColumn(x4, "x5", each Text.Combine([x4])),
#"Removed Columns" = Table.RemoveColumns(x5,{"x1", "x2", "x3", "x4"})
in
#"Removed Columns"