删除重复项 - 需要匹配公司名称和多个地址
Removing duplicates - Need to match on business name and multiple addresses
我需要根据 'businessName' 和完整的匹配地址来识别和删除重复项。鉴于下面的 XML,我希望 id 为 1 和 3 的客户匹配,因为 businessName 匹配并且他们的至少一个地址匹配(不包括地址 1、城市、州邮政编码...地址 2)。请注意,对于地址匹配,'postalCode' 只需要匹配前 5 位数字...而不是 +4 邮编。
XSLT 2.0 正常(Saxon 企业版)
我假设我会使用 for-each-group 但我对如何处理地址匹配感到困惑,因为每个客户端可以有多个地址。我一直在和 following-sibling 一起玩,但一无所获。任何解决方案或指示表示赞赏。谢谢。
<xsl:for-each-group select="Clients/client" group-by="businessName">
</xsl:for-each-group>
<Clients>
<client>
<id>1</id>
<businessName>ABC Tile</businessName>
<addresses>
<address>
<address1>PO Box 1057</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
<address>
<address1>PO Box 621188</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
</addresses>
</client>
<client>
<id>2</id>
<businessName>123 Tile</businessName>
<addresses>
<address>
<address1>567 Main Street</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
</addresses>
</client>
<client>
<id>3</id>
<businessName>ABC Tile</businessName>
<addresses>
<address>
<address1>123 Main Street</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
<address>
<address1>PO Box 1057</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801235555</postalCode>
</address>
</addresses>
</client>
</Clients>
这是客户 ID 1 列出所有匹配的客户 ID 的预期结果。
<Clients>
<client>
<id>1</id>
<clientMatch>3</clientMatch>
<businessName>ABC Tile</businessName>
<addresses>
<address>
<address1>PO Box 1057</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
<address>
<address1>PO Box 621188</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
</addresses>
</client>
<client>
<id>2</id>
<businessName>123 Tile</businessName>
<addresses>
<address>
<address1>567 Main Street</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
</addresses>
</client>
</Clients>
我认为您可以在 businessName
上使用 for-each-group
,但进一步使用该结构很困难,因为您想要比较是否至少有一个 address
匹配。所以我想出了 http://xsltransform.net/gWvjQeP/1 确实
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="xs mf">
<xsl:output indent="yes"/>
<xsl:function name="mf:key" as="xs:string">
<xsl:param name="address" as="element(address)"/>
<xsl:sequence select="concat($address/address1, '|', $address/city, '|', $address/state, '|', substring($address/postalCode, 1, 5))"/>
</xsl:function>
<xsl:template match="Clients">
<xsl:copy>
<xsl:for-each-group select="client" group-by="businessName">
<xsl:for-each select="current-group()">
<xsl:variable name="pos" as="xs:integer" select="position()"/>
<xsl:if test="not(current-group()[position() lt $pos][addresses/address/mf:key(.) = current()/addresses/address/mf:key(.)])">
<xsl:copy>
<xsl:copy-of select="id"/>
<clientMatch>
<xsl:value-of select="current-group()[position() gt $pos][addresses/address/mf:key(.) = current()/addresses/address/mf:key(.)]/id" separator=", "/>
</clientMatch>
<xsl:copy-of select="* except id"/>
</xsl:copy>
</xsl:if>
</xsl:for-each>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:transform>
我不确定您是要输出所有匹配元素的所有 address
元素,还是只输出第一个元素的元素,您的问题只显示第一个元素的元素,所以目前示例会这样做。
我需要根据 'businessName' 和完整的匹配地址来识别和删除重复项。鉴于下面的 XML,我希望 id 为 1 和 3 的客户匹配,因为 businessName 匹配并且他们的至少一个地址匹配(不包括地址 1、城市、州邮政编码...地址 2)。请注意,对于地址匹配,'postalCode' 只需要匹配前 5 位数字...而不是 +4 邮编。
XSLT 2.0 正常(Saxon 企业版)
我假设我会使用 for-each-group 但我对如何处理地址匹配感到困惑,因为每个客户端可以有多个地址。我一直在和 following-sibling 一起玩,但一无所获。任何解决方案或指示表示赞赏。谢谢。
<xsl:for-each-group select="Clients/client" group-by="businessName">
</xsl:for-each-group>
<Clients>
<client>
<id>1</id>
<businessName>ABC Tile</businessName>
<addresses>
<address>
<address1>PO Box 1057</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
<address>
<address1>PO Box 621188</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
</addresses>
</client>
<client>
<id>2</id>
<businessName>123 Tile</businessName>
<addresses>
<address>
<address1>567 Main Street</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
</addresses>
</client>
<client>
<id>3</id>
<businessName>ABC Tile</businessName>
<addresses>
<address>
<address1>123 Main Street</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
<address>
<address1>PO Box 1057</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801235555</postalCode>
</address>
</addresses>
</client>
</Clients>
这是客户 ID 1 列出所有匹配的客户 ID 的预期结果。
<Clients>
<client>
<id>1</id>
<clientMatch>3</clientMatch>
<businessName>ABC Tile</businessName>
<addresses>
<address>
<address1>PO Box 1057</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
<address>
<address1>PO Box 621188</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
</addresses>
</client>
<client>
<id>2</id>
<businessName>123 Tile</businessName>
<addresses>
<address>
<address1>567 Main Street</address1>
<address2/>
<city>Denver</city>
<state>CO</state>
<postalCode>801230000</postalCode>
</address>
</addresses>
</client>
</Clients>
我认为您可以在 businessName
上使用 for-each-group
,但进一步使用该结构很困难,因为您想要比较是否至少有一个 address
匹配。所以我想出了 http://xsltransform.net/gWvjQeP/1 确实
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="xs mf">
<xsl:output indent="yes"/>
<xsl:function name="mf:key" as="xs:string">
<xsl:param name="address" as="element(address)"/>
<xsl:sequence select="concat($address/address1, '|', $address/city, '|', $address/state, '|', substring($address/postalCode, 1, 5))"/>
</xsl:function>
<xsl:template match="Clients">
<xsl:copy>
<xsl:for-each-group select="client" group-by="businessName">
<xsl:for-each select="current-group()">
<xsl:variable name="pos" as="xs:integer" select="position()"/>
<xsl:if test="not(current-group()[position() lt $pos][addresses/address/mf:key(.) = current()/addresses/address/mf:key(.)])">
<xsl:copy>
<xsl:copy-of select="id"/>
<clientMatch>
<xsl:value-of select="current-group()[position() gt $pos][addresses/address/mf:key(.) = current()/addresses/address/mf:key(.)]/id" separator=", "/>
</clientMatch>
<xsl:copy-of select="* except id"/>
</xsl:copy>
</xsl:if>
</xsl:for-each>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:transform>
我不确定您是要输出所有匹配元素的所有 address
元素,还是只输出第一个元素的元素,您的问题只显示第一个元素的元素,所以目前示例会这样做。