正则表达式提取、删除重复项,并在 Google 个工作表中使用管道连接
Regex extract, remove duplicates, and join with a pipe in Google Sheets
我有如下几行数据:
Cat A>Subcat A|Cat A>Subcat B
Cat A>Subcat C|Cat B>Subcat A|Cat B>Subcat C|Cat C
您会注意到,它基本上是一个由竖线分隔的父类别和子类别的列表 |
我需要通过以下两种方式从每一行中提取数据:
- 获取所有父类别并用竖线分隔它们
|
(删除重复项)。
- 获取所有子类别名称并用竖线分隔它们
|
(删除重复项)。
根据提供的前两行,结果应如下所示:
String
Parents (Result 1)
Children (Result 2)
Cat A>Subcat A|Cat A>Subcat B
Cat A
Subcat A|Subcat B
Cat A>Subcat C|Cat B>Subcat A|Cat B>Subcat C|Cat C
Cat A|Cat B|Cat C
Subcat C|Subcat A
我已经能够使用 REGEXEXTRACT
和 JOIN
获得部分结果,但它要么只匹配一次,要么 returns 多次。示例:
# Returns the first instance of "Cat A" only
=REGEXEXTRACT(H2,"(.*?)>.*?\|")
我希望获得帮助来创建两个正则表达式模式,以获得所需的“结果 1”和“结果 2”
尝试:
=ARRAYFORMULA(REGEXREPLACE(TRIM(SUBSTITUTE(TRANSPOSE(QUERY(QUERY(QUERY(
IFNA(SPLIT(UNIQUE(FLATTEN(ROW(A1:A2)&"×"&
REGEXREPLACE(REGEXEXTRACT(SPLIT(A1:A2&">", "|", 1), "(.*)>"), "(>.*)", )))&"|", "×")),
"select max(Col2)
where Col1 is not null
group by Col2
pivot Col1"),
"offset 1", 0),,9^9)), "| ", "|")), "\|$", ))
和:
=ARRAYFORMULA(REGEXREPLACE(TRIM(SUBSTITUTE(TRANSPOSE(TRIM(QUERY(QUERY(QUERY(
IFNA(SPLIT(UNIQUE(FLATTEN(ROW(A1:A2)&"×"&
REGEXREPLACE(REGEXEXTRACT(SPLIT(A1:A2, "|", 1), ">(.*)"), "(>.*)", )))&"|", "×")),
"select max(Col2)
where Col1 is not null
group by Col2
pivot Col1"),
"offset 1", 0),,9^9))), "| ", "|")), "\|$", ))
我有如下几行数据:
Cat A>Subcat A|Cat A>Subcat B
Cat A>Subcat C|Cat B>Subcat A|Cat B>Subcat C|Cat C
您会注意到,它基本上是一个由竖线分隔的父类别和子类别的列表 |
我需要通过以下两种方式从每一行中提取数据:
- 获取所有父类别并用竖线分隔它们
|
(删除重复项)。 - 获取所有子类别名称并用竖线分隔它们
|
(删除重复项)。
根据提供的前两行,结果应如下所示:
String | Parents (Result 1) | Children (Result 2) |
---|---|---|
Cat A>Subcat A|Cat A>Subcat B | Cat A | Subcat A|Subcat B |
Cat A>Subcat C|Cat B>Subcat A|Cat B>Subcat C|Cat C | Cat A|Cat B|Cat C | Subcat C|Subcat A |
我已经能够使用 REGEXEXTRACT
和 JOIN
获得部分结果,但它要么只匹配一次,要么 returns 多次。示例:
# Returns the first instance of "Cat A" only
=REGEXEXTRACT(H2,"(.*?)>.*?\|")
我希望获得帮助来创建两个正则表达式模式,以获得所需的“结果 1”和“结果 2”
尝试:
=ARRAYFORMULA(REGEXREPLACE(TRIM(SUBSTITUTE(TRANSPOSE(QUERY(QUERY(QUERY(
IFNA(SPLIT(UNIQUE(FLATTEN(ROW(A1:A2)&"×"&
REGEXREPLACE(REGEXEXTRACT(SPLIT(A1:A2&">", "|", 1), "(.*)>"), "(>.*)", )))&"|", "×")),
"select max(Col2)
where Col1 is not null
group by Col2
pivot Col1"),
"offset 1", 0),,9^9)), "| ", "|")), "\|$", ))
和:
=ARRAYFORMULA(REGEXREPLACE(TRIM(SUBSTITUTE(TRANSPOSE(TRIM(QUERY(QUERY(QUERY(
IFNA(SPLIT(UNIQUE(FLATTEN(ROW(A1:A2)&"×"&
REGEXREPLACE(REGEXEXTRACT(SPLIT(A1:A2, "|", 1), ">(.*)"), "(>.*)", )))&"|", "×")),
"select max(Col2)
where Col1 is not null
group by Col2
pivot Col1"),
"offset 1", 0),,9^9))), "| ", "|")), "\|$", ))