删除重复项 - 需要匹配公司名称和多个地址

Removing duplicates - Need to match on business name and multiple addresses

我需要根据 'businessName' 和完整的匹配地址来识别和删除重复项。鉴于下面的 XML,我希望 id 为 1 和 3 的客户匹配,因为 businessName 匹配并且他们的至少一个地址匹配(不包括地址 1、城市、州邮政编码...地址 2)。请注意,对于地址匹配,'postalCode' 只需要匹配前 5 位数字...而不是 +4 邮编。

XSLT 2.0 正常(Saxon 企业版)

我假设我会使用 for-each-group 但我对如何处理地址匹配感到困惑,因为每个客户端可以有多个地址。我一直在和 following-sibling 一起玩,但一无所获。任何解决方案或指示表示赞赏。谢谢。

<xsl:for-each-group select="Clients/client" group-by="businessName">
</xsl:for-each-group>


<Clients>
    <client>
        <id>1</id>
        <businessName>ABC Tile</businessName>
        <addresses>
            <address>
                <address1>PO Box 1057</address1>
                <address2/>
                <city>Denver</city>
                <state>CO</state>
                <postalCode>801230000</postalCode>
            </address>
            <address>
                <address1>PO Box 621188</address1>
                <address2/>
                <city>Denver</city>
                <state>CO</state>
                <postalCode>801230000</postalCode>
            </address>
        </addresses>
    </client>
    <client>
        <id>2</id>
        <businessName>123 Tile</businessName>
        <addresses>
            <address>
                <address1>567 Main Street</address1>
                <address2/>
                <city>Denver</city>
                <state>CO</state>
                <postalCode>801230000</postalCode>
            </address>
        </addresses>
    </client>
    <client>
        <id>3</id>
        <businessName>ABC Tile</businessName>
        <addresses>
            <address>
                <address1>123 Main Street</address1>
                <address2/>
                <city>Denver</city>
                <state>CO</state>
                <postalCode>801230000</postalCode>
            </address>
            <address>
                <address1>PO Box 1057</address1>
                <address2/>
                <city>Denver</city>
                <state>CO</state>
                <postalCode>801235555</postalCode>
            </address>
        </addresses>
    </client>
</Clients>

这是客户 ID 1 列出所有匹配的客户 ID 的预期结果。

<Clients>
    <client>
        <id>1</id>
        <clientMatch>3</clientMatch>
        <businessName>ABC Tile</businessName>
        <addresses>
            <address>
                <address1>PO Box 1057</address1>
                <address2/>
                <city>Denver</city>
                <state>CO</state>
                <postalCode>801230000</postalCode>
            </address>
            <address>
                <address1>PO Box 621188</address1>
                <address2/>
                <city>Denver</city>
                <state>CO</state>
                <postalCode>801230000</postalCode>
            </address>
        </addresses>
    </client>
    <client>
        <id>2</id>
        <businessName>123 Tile</businessName>
        <addresses>
            <address>
                <address1>567 Main Street</address1>
                <address2/>
                <city>Denver</city>
                <state>CO</state>
                <postalCode>801230000</postalCode>
            </address>
        </addresses>
    </client>
</Clients>

我认为您可以在 businessName 上使用 for-each-group,但进一步使用该结构很困难,因为您想要比较是否至少有一个 address 匹配。所以我想出了 http://xsltransform.net/gWvjQeP/1 确实

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="xs mf">

<xsl:output indent="yes"/>

<xsl:function name="mf:key" as="xs:string">
    <xsl:param name="address" as="element(address)"/>
    <xsl:sequence select="concat($address/address1, '|', $address/city, '|', $address/state, '|', substring($address/postalCode, 1, 5))"/>
</xsl:function>

<xsl:template match="Clients">
    <xsl:copy>
        <xsl:for-each-group select="client" group-by="businessName">
          <xsl:for-each select="current-group()">
              <xsl:variable name="pos" as="xs:integer" select="position()"/>
              <xsl:if test="not(current-group()[position() lt $pos][addresses/address/mf:key(.) = current()/addresses/address/mf:key(.)])">
                  <xsl:copy>
                      <xsl:copy-of select="id"/>
                      <clientMatch>
                          <xsl:value-of select="current-group()[position() gt $pos][addresses/address/mf:key(.) = current()/addresses/address/mf:key(.)]/id" separator=", "/>
                      </clientMatch>
                      <xsl:copy-of select="* except id"/>
                  </xsl:copy>
              </xsl:if>
          </xsl:for-each>

        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>
</xsl:transform>

我不确定您是要输出所有匹配元素的所有 address 元素,还是只输出第一个元素的元素,您的问题只显示第一个元素的元素,所以目前示例会这样做。