使用 WSO2 ESB 和 smooks 拆分和路由带有命名空间的大型 XML

Split and route large XML with namespace using WSO2 ESB and smooks

我需要处理一个 XML 文件,该文件在其根元素上有命名空间声明并包含 +133K 子元素,其大小约为 500MB;为了实现这一点,我正在使用 WSO2 ESB 5 和 smooks 调解器。

基本上我正在寻找的是将输入文件分成具有预定义结构的小块,并将每个块发送到队列以供以后处理。

我首先尝试进行 XSLT 转换以从输入文件中删除命名空间,但我遇到了这样的 OutOfMemory 错误:

TID: [-1234] [] [2017-03-02 03:04:43,900] ERROR {org.apache.axis2.transport.base.threads.NativeWorkerPool} -  Uncaught exception {org.apache.axis2.transport.base.threads.NativeWorkerPool}
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.apache.axiom.om.impl.llom.factory.OMLinkedListImplFactory.createOMText(OMLinkedListImplFactory.java:192)
    at org.apache.axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:294)
    at org.apache.axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:250)
    at org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:252)
    at org.apache.axiom.om.impl.llom.OMSerializableImpl.build(OMSerializableImpl.java:78)
    at org.apache.axiom.om.impl.llom.OMElementImpl.build(OMElementImpl.java:722)
    at org.apache.axiom.om.impl.llom.OMElementImpl.detach(OMElementImpl.java:700)
    at org.apache.axiom.om.impl.llom.OMNodeImpl.setParent(OMNodeImpl.java:105)
    at org.apache.axiom.om.impl.llom.OMNodeImpl.insertSiblingAfter(OMNodeImpl.java:203)
    at org.apache.synapse.mediators.transform.XSLTMediator.performXSLT(XSLTMediator.java:366)
    at org.apache.synapse.mediators.transform.XSLTMediator.mediate(XSLTMediator.java:202)
    at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:97)
    at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:59)
    at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
    at org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:210)
    at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
    at org.apache.axis2.transport.base.AbstractTransportListener.handleIncomingMessage(AbstractTransportListener.java:328)
    at org.apache.synapse.transport.vfs.VFSTransportListener.processFile(VFSTransportListener.java:824)
    at org.apache.synapse.transport.vfs.VFSTransportListener.scanFileOrDirectory(VFSTransportListener.java:472)
    at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:188)
    at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:134)
    at org.apache.axis2.transport.base.AbstractPollingTransportListener.run(AbstractPollingTransportListener.java:67)
    at org.apache.axis2.transport.base.threads.NativeWorkerPool.run(NativeWorkerPool.java:172)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

我不明白为什么会这样,因为我的虚拟机配置为使用 -Xms4096m -Xmx6144m

基于之前的错误,我决定使用 smooks 实现一种流媒体解决方案,然后我定义了一个 vfs 代理服务来轮询文件夹并将文件提供给 smook 调解器,但我不断收到似乎相关的错误到输入文件的根元素上的名称空间定义,我提到这一点是因为每当我编辑输入文件并摆脱名称空间定义时,我在 WSO2 ESB 上定义和部署的内容都能完美运行。这里的重点是我从后端黑盒系统接收大文件,我应该处理命名空间的东西。

以下是我在 ESB 上的定义:

代理服务

<?xml version="1.0" encoding="UTF-8"?>
<proxy xmlns="http://ws.apache.org/ns/synapse"
       name="Tryzens_ProductProxy"
       startOnLoad="true"
       statistics="disable"
       trace="disable"
       transports="vfs">
   <target>
      <inSequence>
         <log level="custom">
            <property name="Tryzens_ProductProxy__tracing" value="before smooks"/>
         </log>
         <property name="DISABLE_SMOOKS_RESULT_PAYLOAD" value="true"/>
         <smooks config-key="ProductSplitJMS_Smook">
            <input type="xml"/>
            <output type="xml"/>
         </smooks>
         <log level="custom">
            <property name="Tryzens_ProductProxy__tracing" value="after smooks"/>
         </log>
      </inSequence>
   </target>
   <parameter name="transport.vfs.Streaming">true</parameter>
   <parameter name="transport.PollInterval">15</parameter>
   <parameter name="transport.vfs.ActionAfterProcess">MOVE</parameter>
   <parameter name="transport.vfs.FileURI">vfs:file:///home/jairof/wso2/00_test/working/tryzens/smook_product/</parameter>
   <parameter name="transport.vfs.MoveAfterProcess">vfs:file:///home/jairof/wso2/00_test/working/tryzens/output/</parameter>
   <parameter name="transport.vfs.MoveAfterFailure">vfs:file:///home/jairof/wso2/00_test/working/tryzens/fails/</parameter>
   <parameter name="transport.vfs.FileNamePattern">.*.xml</parameter>
   <parameter name="transport.vfs.ContentType">application/xml</parameter>
   <parameter name="transport.vfs.ActionAfterFailure">MOVE</parameter>
   <description/>
</proxy>

Smooks配置

<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd" xmlns:xsl="http://www.milyn.org/xsd/smooks/xsl-1.1.xsd" xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd" xmlns:jms="http://www.milyn.org/xsd/smooks/jms-routing-1.2.xsd">
      <params>
         <param name="stream.filter.type">SAX</param>
         <param name="default.serialization.on">false</param>
      </params>
      <resource-config selector="product">
         <resource>org.milyn.delivery.DomModelCreator</resource>
      </resource-config>
      <jms:router routeOnElement="product" beanId="productItem_xml" destination="dynamicQueues/TestFL">
         <jms:connection factory="QueueConnectionFactory"/>
         <jms:jndi contextFactory="org.apache.activemq.jndi.ActiveMQInitialContextFactory" providerUrl="tcp://localhost:61616"/>
         <jms:highWaterMark mark="-1"/>
      </jms:router>
      <ftl:freemarker applyOnElement="product">
         <ftl:template>/repository/resources/smooks/product.ftl</ftl:template>
         <ftl:use>
            <ftl:bindTo id="productItem_xml"/>
         </ftl:use>
      </ftl:freemarker>
</smooks-resource-list>

Smooks 模板

此模板仅供测试,真实对应产品元素的完整结构,但要重现错误情况就足够了:

<#ftl ns_prefixes={"ns1": "http://www.demandware.com/xml/impex/catalog/2006-10-31"}>
<product id='${.vars["product"]["@product-id"]}'>
    <ean>${product.ean}</ean>        
</product>

示例输入文件

请注意,实际文件中有超过 133K 种产品,在本示例中我剪切了文件的大部分,只留下了两种产品

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.demandware.com/xml/impex/catalog/2006-10-31" catalog-id="tml-catalog-en">
    <header>
        <image-settings>
            <internal-location base-path="/images"/>
            <view-types>
                <view-type>original</view-type>
                <view-type>portrait</view-type>
                <view-type>badge_GBP</view-type>
                <view-type>badge_EUR</view-type>
                <view-type>badge_USD</view-type>
                <view-type>badge_AUD</view-type>
                <view-type>badge_CZH</view-type>
                <view-type>ctlimage</view-type>
                <view-type>badge_FRA</view-type>
                <view-type>badge_GER</view-type>
                <view-type>landscape</view-type>
            </view-types>
            <alt-pattern>${productname}, ${variationvalue}, ${viewtype}</alt-pattern>
            <title-pattern>${productname}, ${variationvalue}</title-pattern>
        </image-settings>
    </header>

    <category category-id="MensShoes">
        <display-name xml:lang="de-DE">Schuhe</display-name>
        <display-name xml:lang="x-default">Shoes</display-name>
        <display-name xml:lang="fr-FR">Chaussures</display-name>
        <online-flag>true</online-flag>
        <parent>MENSWEAR</parent>
        <position>12.0</position>
        <image>images/slot/landing/men_menlanding_H1_GBP.jpg</image>
        <template/>
        <page-attributes/>
        <custom-attributes>
            <custom-attribute attribute-id="categoryRecommendationsEnable">false</custom-attribute>
            <custom-attribute attribute-id="enableCompare">false</custom-attribute>
            <custom-attribute attribute-id="enableGridItemButtonStrip">false</custom-attribute>
            <custom-attribute attribute-id="enableGridItemMobileButtonStrip">false</custom-attribute>
            <custom-attribute attribute-id="enableUserJourney">false</custom-attribute>
            <custom-attribute attribute-id="enableWishlist">false</custom-attribute>
            <custom-attribute attribute-id="fitsme_enabled">false</custom-attribute>
            <custom-attribute attribute-id="rrGenere">false</custom-attribute>
            <custom-attribute attribute-id="rsCategoryEnabled">false</custom-attribute>
            <custom-attribute attribute-id="shopAllButton">false</custom-attribute>
            <custom-attribute attribute-id="showInMenu">true</custom-attribute>
            <custom-attribute attribute-id="showInMobileMenu">false</custom-attribute>
            <custom-attribute attribute-id="show_alternate_image_on_plp">false</custom-attribute>
            <custom-attribute attribute-id="slotBannerImage">images/slot/landing/men_menlanding_H1_GBP.jpg</custom-attribute>
        </custom-attributes>
    </category>

    <category category-id="P50 SUIT">
        <display-name xml:lang="de-DE">Hosen</display-name>
        <display-name xml:lang="x-default">Trousers</display-name>
        <display-name xml:lang="fr-FR">Pantalons</display-name>
        <online-flag>true</online-flag>
        <parent>WomensTailoring</parent>
        <position>0.0</position>
        <template/>
        <page-attributes/>
    </category>

    <product product-id="0">
        <ean/>
        <upc/>
        <unit/>
        <min-order-quantity>1</min-order-quantity>
        <step-quantity>1</step-quantity>
        <store-force-price-flag>false</store-force-price-flag>
        <store-non-inventory-flag>false</store-non-inventory-flag>
        <store-non-revenue-flag>false</store-non-revenue-flag>
        <store-non-discountable-flag>false</store-non-discountable-flag>
        <online-flag>false</online-flag>
        <available-flag>true</available-flag>
        <searchable-flag>true</searchable-flag>
        <images>
            <image-group view-type="badge_EUR">
                <image path="badge/blank.png"/>
            </image-group>
            <image-group view-type="badge_GBP">
                <image path="badge/blank.png"/>
            </image-group>
            <image-group view-type="badge_GER">
                <image path="badge/blank.png"/>
            </image-group>
            <image-group view-type="badge_USD">
                <image path="badge/blank.png"/>
            </image-group>
        </images>
        <page-attributes/>
        <pinterest-enabled-flag>false</pinterest-enabled-flag>
        <facebook-enabled-flag>false</facebook-enabled-flag>
        <store-attributes>
            <force-price-flag>false</force-price-flag>
            <non-inventory-flag>false</non-inventory-flag>
            <non-revenue-flag>false</non-revenue-flag>
            <non-discountable-flag>false</non-discountable-flag>
        </store-attributes>
    </product>

    <product product-id="12024">
        <ean/>
        <upc/>
        <unit/>
        <min-order-quantity>1</min-order-quantity>
        <step-quantity>1</step-quantity>
        <store-force-price-flag>false</store-force-price-flag>
        <store-non-inventory-flag>false</store-non-inventory-flag>
        <store-non-revenue-flag>false</store-non-revenue-flag>
        <store-non-discountable-flag>false</store-non-discountable-flag>
        <online-flag>false</online-flag>
        <available-flag>true</available-flag>
        <searchable-flag>true</searchable-flag>
        <images>
            <image-group view-type="original">
                <image path="original/12024_original_original.jpg"/>
            </image-group>
        </images>
        <brand>J FRANCOMB</brand>
        <page-attributes/>
        <custom-attributes>
            <custom-attribute attribute-id="allocGroup">X</custom-attribute>
            <custom-attribute attribute-id="colour">
                <value>3PNK-PINK</value>
            </custom-attribute>
            <custom-attribute attribute-id="cuffType">
                <value>SINGLE CUFF</value>
            </custom-attribute>
            <custom-attribute attribute-id="enable_pdp_asset_footer_layout">false</custom-attribute>
            <custom-attribute attribute-id="fabric">
                <value>LEWIN 100 PD</value>
            </custom-attribute>
            <custom-attribute attribute-id="fit">SEMI FIT</custom-attribute>
            <custom-attribute attribute-id="gender">
                <value>M</value>
            </custom-attribute>
            <custom-attribute attribute-id="look">PTRN447</custom-attribute>
            <custom-attribute attribute-id="pattern">
                <value>PATTERN</value>
            </custom-attribute>
            <custom-attribute attribute-id="productIDCIMS">12024</custom-attribute>
            <custom-attribute attribute-id="retailTypeCIMS">M FORMAL</custom-attribute>
            <custom-attribute attribute-id="seasonCIMS">307B</custom-attribute>
            <custom-attribute attribute-id="styleName">MILSC PATTERN DOOM AND BLOOM</custom-attribute>
            <custom-attribute attribute-id="styleNameCIMS">MILSC PATTERN DOOM AND BLOOM</custom-attribute>
            <custom-attribute attribute-id="styleNumberCIMS">MS17</custom-attribute>
            <custom-attribute attribute-id="typeDesc">MS SHIRTS</custom-attribute>
            <custom-attribute attribute-id="weight">0.3</custom-attribute>
        </custom-attributes>
        <options>
            <shared-option option-id="sleeveLengthAlteration"/>
            <shared-option option-id="giftBox"/>
        </options>
        <variations>
            <attributes>
                <shared-variation-attribute attribute-id="collarSize" variation-attribute-id="collarSize"/>
                <shared-variation-attribute attribute-id="sleeveLength" variation-attribute-id="sleeveLength"/>
            </attributes>
        </variations>
        <classification-category>S17 MILAN</classification-category>
        <pinterest-enabled-flag>false</pinterest-enabled-flag>
        <facebook-enabled-flag>false</facebook-enabled-flag>
        <store-attributes>
            <force-price-flag>false</force-price-flag>
            <non-inventory-flag>false</non-inventory-flag>
            <non-revenue-flag>false</non-revenue-flag>
            <non-discountable-flag>false</non-discountable-flag>
        </store-attributes>
    </product>

    <category-assignment category-id="T43 HERITAGE" product-id="505158991125">
        <primary-flag>true</primary-flag>
    </category-assignment>
    <category-assignment category-id="U30 BOXERS" product-id="505158774834"/>
    <recommendation source-id="58462" source-type="product" target-id="505158886294" type="4"/>
</catalog>

wso2carbon.log 文件中有错误

TID: [-1234] [] [2017-03-02 12:15:27,793]  INFO {org.apache.synapse.mediators.builtin.LogMediator} -  Tryzens_ProductProxy__tracing = before smooks {org.apache.synapse.mediators.builtin.LogMediator}
TID: [-1234] [] [2017-03-02 12:15:28,376] ERROR {freemarker.runtime} -   {freemarker.runtime}

Error on line 3, column 12 in repository/resources/smooks/product.ftl
Expecting a string, date or number here, Expression product.ean is instead a freemarker.ext.dom.NodeListModel
The problematic instruction:
----------
==> ${product.ean} [on line 3, column 10 in repository/resources/smooks/product.ftl]
----------

Java backtrace for programmers:
----------
freemarker.core.NonStringException: Error on line 3, column 12 in repository/resources/smooks/product.ftl
Expecting a string, date or number here, Expression product.ean is instead a freemarker.ext.dom.NodeListModel
    at freemarker.core.Expression.getStringValue(Expression.java:126)
    at freemarker.core.Expression.getStringValue(Expression.java:93)
    at freemarker.core.DollarVariable.accept(DollarVariable.java:76)
    at freemarker.core.Environment.visit(Environment.java:209)
    at freemarker.core.MixedContent.accept(MixedContent.java:92)
    at freemarker.core.Environment.visit(Environment.java:209)
    at freemarker.core.Environment.process(Environment.java:189)
    at freemarker.template.Template.process(Template.java:237)
    at org.milyn.templating.freemarker.FreeMarkerTemplateProcessor.applyTemplate(FreeMarkerTemplateProcessor.java:358)
    at org.milyn.templating.freemarker.FreeMarkerTemplateProcessor.applyTemplate(FreeMarkerTemplateProcessor.java:346)
    at org.milyn.templating.freemarker.FreeMarkerTemplateProcessor.visitAfter(FreeMarkerTemplateProcessor.java:333)
    at org.milyn.delivery.sax.SAXHandler.visitAfter(SAXHandler.java:389)
    at org.milyn.delivery.sax.SAXHandler.endElement(SAXHandler.java:204)
    at org.milyn.delivery.SmooksContentHandler.endElement(SmooksContentHandler.java:96)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
    at org.milyn.Smooks._filter(Smooks.java:526)
    at org.milyn.Smooks.filterSource(Smooks.java:482)
    at org.wso2.carbon.mediator.transform.SmooksMediator.mediate(SmooksMediator.java:146)
    at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:97)
    at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:59)
    at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
    at org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:210)
    at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
    at org.apache.axis2.transport.base.AbstractTransportListener.handleIncomingMessage(AbstractTransportListener.java:328)
    at org.apache.synapse.transport.vfs.VFSTransportListener.processFile(VFSTransportListener.java:824)
    at org.apache.synapse.transport.vfs.VFSTransportListener.scanFileOrDirectory(VFSTransportListener.java:472)
    at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:188)
    at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:134)
    at org.apache.axis2.transport.base.AbstractPollingTransportListener.run(AbstractPollingTransportListener.java:67)
    at org.apache.axis2.transport.base.threads.NativeWorkerPool.run(NativeWorkerPool.java:172)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

请帮忙,如果有任何意见可以解决这个问题,我将不胜感激 提前致谢

在 smooks 模板(.ftl 文件)中,如果您想使用类似 ${product.ean} 的东西,您必须定义 "product" 变量:

<#assign product = .vars["product"]>

在您的 xml 输入文件中,所有节点都属于同一个默认命名空间 "http://www.demandware.com/xml/impex/catalog/2006-10-31"

您可以使用保留前缀 "D" 在 FTL 中定义此默认命名空间:<#ftl ns_prefixes={"D":"http://www.demandware.com/xml/impex/catalog/2006-10-31"}>