MLCP 汇总 XML
MLCP aggregated XML
我尝试使用 MLCP 将汇总的 XML 文件加载到 ML8 中。
这是我的数据:
<?xml version="1.0" encoding="UTF-8"?>
<export:batch xmlns:export="http://schemas.dikw.nl/exporter/1.0" xmlns="http://schemas.dikw.nl/export/1.0">
<cdm:BerichtInhoud xmlns:cdm="http://schemas.dikw.nl/data/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.dikw.nl/data.xsd">
<cdm:berichtMetaData>
<cdm:Bericht>first message</cdm:Bericht>
</cdm:berichtMetaData>
</cdm:BerichtInhoud>
<cdm:BerichtInhoud xmlns:cdm="http://schemas.dikw.nl/data/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.dikw.nl/data.xsd">
<cdm:berichtMetaData>
<cdm:Bericht>second message</cdm:Bericht>
</cdm:berichtMetaData>
</cdm:BerichtInhoud>
</export:batch>
这是我使用的 mlcp 命令:
mlcp.sh import \
-host localhost \
-port 27041 \
-username admin \
-password admin \
-input_file_path ../sampledata/thijstest \
-input_file_type aggregates \
-aggregate_record_element BerichtInhoud \
-aggregate_uri_id berichtId \
-output_uri_prefix /sample/thijstest/ \
-mode local
命令行的结果是这样的:
15/09/10 10:23:51 INFO contentpump.ContentPump: Hadoop library version: 2.6.0
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: Content type: XML
15/09/10 10:23:51 INFO input.FileInputFormat: Total input paths to process : 1
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: completed 100%
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: com.marklogic.contentpump.ContentPumpStats:
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: ATTEMPTED_INPUT_RECORD_COUNT: 0
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: SKIPPED_INPUT_RECORD_COUNT: 0
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: Total execution time: 0 sec
所以我得出结论,未找到元素 'BerichtInhoud',我尝试包含像 -aggregate_record_element cmd:BerichtInhoud.
这样的命名空间
可能与此有关'bug',虽然它是从一月开始的。
Loading data with mlcp - namespace issue
必须将其用于命名空间:
-aggregate_record_namespace "http://schemas.dikw.nl/cdm/1.2" \
我尝试使用 MLCP 将汇总的 XML 文件加载到 ML8 中。
这是我的数据:
<?xml version="1.0" encoding="UTF-8"?>
<export:batch xmlns:export="http://schemas.dikw.nl/exporter/1.0" xmlns="http://schemas.dikw.nl/export/1.0">
<cdm:BerichtInhoud xmlns:cdm="http://schemas.dikw.nl/data/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.dikw.nl/data.xsd">
<cdm:berichtMetaData>
<cdm:Bericht>first message</cdm:Bericht>
</cdm:berichtMetaData>
</cdm:BerichtInhoud>
<cdm:BerichtInhoud xmlns:cdm="http://schemas.dikw.nl/data/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schemas.dikw.nl/data.xsd">
<cdm:berichtMetaData>
<cdm:Bericht>second message</cdm:Bericht>
</cdm:berichtMetaData>
</cdm:BerichtInhoud>
</export:batch>
这是我使用的 mlcp 命令:
mlcp.sh import \
-host localhost \
-port 27041 \
-username admin \
-password admin \
-input_file_path ../sampledata/thijstest \
-input_file_type aggregates \
-aggregate_record_element BerichtInhoud \
-aggregate_uri_id berichtId \
-output_uri_prefix /sample/thijstest/ \
-mode local
命令行的结果是这样的:
15/09/10 10:23:51 INFO contentpump.ContentPump: Hadoop library version: 2.6.0
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: Content type: XML
15/09/10 10:23:51 INFO input.FileInputFormat: Total input paths to process : 1
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: completed 100%
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: com.marklogic.contentpump.ContentPumpStats:
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: ATTEMPTED_INPUT_RECORD_COUNT: 0
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: SKIPPED_INPUT_RECORD_COUNT: 0
15/09/10 10:23:51 INFO contentpump.LocalJobRunner: Total execution time: 0 sec
所以我得出结论,未找到元素 'BerichtInhoud',我尝试包含像 -aggregate_record_element cmd:BerichtInhoud.
这样的命名空间可能与此有关'bug',虽然它是从一月开始的。
Loading data with mlcp - namespace issue
必须将其用于命名空间:
-aggregate_record_namespace "http://schemas.dikw.nl/cdm/1.2" \