正则表达式提取、删除重复项,并在 Google 个工作表中使用管道连接

Regex extract, remove duplicates, and join with a pipe in Google Sheets

我有如下几行数据:

Cat A>Subcat A|Cat A>Subcat B
Cat A>Subcat C|Cat B>Subcat A|Cat B>Subcat C|Cat C

您会注意到,它基本上是一个由竖线分隔的父类别和子类别的列表 |

我需要通过以下两种方式从每一行中提取数据:

  1. 获取所有父类别并用竖线分隔它们|(删除重复项)。
  2. 获取所有子类别名称并用竖线分隔它们|(删除重复项)。

根据提供的前两行,结果应如下所示:

String Parents (Result 1) Children (Result 2)
Cat A>Subcat A|Cat A>Subcat B Cat A Subcat A|Subcat B
Cat A>Subcat C|Cat B>Subcat A|Cat B>Subcat C|Cat C Cat A|Cat B|Cat C Subcat C|Subcat A

我已经能够使用 REGEXEXTRACTJOIN 获得部分结果,但它要么只匹配一次,要么 returns 多次。示例:

# Returns the first instance of "Cat A" only
=REGEXEXTRACT(H2,"(.*?)>.*?\|")

我希望获得帮助来创建两个正则表达式模式,以获得所需的“结果 1”和“结果 2”

尝试:

=ARRAYFORMULA(REGEXREPLACE(TRIM(SUBSTITUTE(TRANSPOSE(QUERY(QUERY(QUERY(
 IFNA(SPLIT(UNIQUE(FLATTEN(ROW(A1:A2)&"×"&
 REGEXREPLACE(REGEXEXTRACT(SPLIT(A1:A2&">", "|", 1), "(.*)>"), "(>.*)", )))&"|", "×")), 
 "select max(Col2) 
  where Col1 is not null 
  group by Col2 
  pivot Col1"), 
 "offset 1", 0),,9^9)), "| ", "|")), "\|$", ))

和:

=ARRAYFORMULA(REGEXREPLACE(TRIM(SUBSTITUTE(TRANSPOSE(TRIM(QUERY(QUERY(QUERY(
 IFNA(SPLIT(UNIQUE(FLATTEN(ROW(A1:A2)&"×"&
 REGEXREPLACE(REGEXEXTRACT(SPLIT(A1:A2, "|", 1), ">(.*)"), "(>.*)", )))&"|", "×")), 
 "select max(Col2) 
  where Col1 is not null 
  group by Col2 
  pivot Col1"), 
 "offset 1", 0),,9^9))), "| ", "|")), "\|$", ))