我无法使用 Xpath 提取节点文本
I can't extract the node text with a Xpath
我有一个像这样的 XML 文件 (test.xml):
<?xml version="1.0" encoding="ISO-8859-1"?>
<s2xResponse>
<s2xData>
<Name>This is the name</Name>
<InfocomData>
<DateOfUpdate day="07" month="02" year="2018">20180207</DateOfUpdate>
<CompanyName>MY COMPANY</CompanyName>
<TaxCode FlagCheck="0">XXXYYYWWWZZZ</TaxCode>
</InfocomData>
<AssessmentSummary>
<Rating Code="2">Rating Description for Code 2</Rating>
</AssessmentSummary>
<AssessmentData>
<SectorialDistribution>
<CompaniesNumber>11650</CompaniesNumber>
<ScoreDistribution />
<CervedScoreDistribution>
<DistributionData>
<Rating Code="1">SICUREZZA</Rating>
<Percentage>1.91</Percentage>
</DistributionData>
<DistributionData>
<Rating Code="2">SOLVIBILITA' ELEVATA</Rating>
<Percentage>35.56</Percentage>
</DistributionData>
</CervedScoreDistribution>
</SectorialDistribution>
</AssessmentData>
</s2xData>
</s2xResponse>
我正在尝试使用 XmlExtractor 通过 U-SQL 脚本获取 "Name" 节点文本 ("This is the name")。以下是我使用的代码:
USE TestXML; // It contains the registered assembly
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@xml = EXTRACT xml_text string
FROM "textxpath/test.xml"
USING Extractors.Text(rowDelimiter: "^", quoting: false);
@xml_cleaned =
SELECT
xml_text.Replace("\r\n", "").Replace("\t", " ") AS xml_text
FROM @xml;
@values =
SELECT Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text, "s2xResponse/s2xData/Name")[1] AS value
FROM @xml_cleaned;
OUTPUT @values TO @"outputs/test_xpath.txt" USING Outputters.Text(quoting: false);
但是我遇到了这个运行时错误:
Execution failed with error '1_SV1_Extract Error :
'{"diagnosticCode":195887116,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXPRESSIONEVALUATION","message":"Error
while evaluating expression
Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text.Replace(\"\r\n\",
\"\").Replace(\"\t\", \" \"),
\"s2xResponse/s2xData/Name\")[1]","description":"Inner exception from
user expression: Index was out of range. Must be non-negative and less
than the size of the collection.
即使我对评估结果 ([0]) 使用零索引,我也会遇到同样的错误。
我的查询有什么问题?
这里的问题是您将下标 [1]
应用于 XPath.Evaluate
的结果,我相信这将返回 Name
节点。但是,您是在代码中而不是在 XPath 中应用 [1]
下标,因此下标可能是从零开始的,而不是像在 XPath 中那样从 1 开始的,因此出现 Index out of range
错误。
这是一个解决方案 - 只需在 Xpath 中应用下标运算符(它仍然基于 1),然后 select text()
那里
.Evaluate("s2xResponse/s2xData/Name[1]/text()")
您想使用 Evaluate
方法有什么特别的原因吗?我让他使用 XmlDomExtractor
工作,这将允许您从 xml 中提取多个值,例如
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE @inputFile string = "/input/input100.xml";
@input =
EXTRACT Name string
FROM @inputFile
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath : "/s2xResponse",
columnPaths : new SQL.MAP<string, string>{
{ "s2xData/Name", "Name" },
}
);
@output =
SELECT *
FROM @input;
我有一个像这样的 XML 文件 (test.xml):
<?xml version="1.0" encoding="ISO-8859-1"?>
<s2xResponse>
<s2xData>
<Name>This is the name</Name>
<InfocomData>
<DateOfUpdate day="07" month="02" year="2018">20180207</DateOfUpdate>
<CompanyName>MY COMPANY</CompanyName>
<TaxCode FlagCheck="0">XXXYYYWWWZZZ</TaxCode>
</InfocomData>
<AssessmentSummary>
<Rating Code="2">Rating Description for Code 2</Rating>
</AssessmentSummary>
<AssessmentData>
<SectorialDistribution>
<CompaniesNumber>11650</CompaniesNumber>
<ScoreDistribution />
<CervedScoreDistribution>
<DistributionData>
<Rating Code="1">SICUREZZA</Rating>
<Percentage>1.91</Percentage>
</DistributionData>
<DistributionData>
<Rating Code="2">SOLVIBILITA' ELEVATA</Rating>
<Percentage>35.56</Percentage>
</DistributionData>
</CervedScoreDistribution>
</SectorialDistribution>
</AssessmentData>
</s2xData>
</s2xResponse>
我正在尝试使用 XmlExtractor 通过 U-SQL 脚本获取 "Name" 节点文本 ("This is the name")。以下是我使用的代码:
USE TestXML; // It contains the registered assembly
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@xml = EXTRACT xml_text string
FROM "textxpath/test.xml"
USING Extractors.Text(rowDelimiter: "^", quoting: false);
@xml_cleaned =
SELECT
xml_text.Replace("\r\n", "").Replace("\t", " ") AS xml_text
FROM @xml;
@values =
SELECT Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text, "s2xResponse/s2xData/Name")[1] AS value
FROM @xml_cleaned;
OUTPUT @values TO @"outputs/test_xpath.txt" USING Outputters.Text(quoting: false);
但是我遇到了这个运行时错误:
Execution failed with error '1_SV1_Extract Error : '{"diagnosticCode":195887116,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXPRESSIONEVALUATION","message":"Error while evaluating expression Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text.Replace(\"\r\n\", \"\").Replace(\"\t\", \" \"), \"s2xResponse/s2xData/Name\")[1]","description":"Inner exception from user expression: Index was out of range. Must be non-negative and less than the size of the collection.
即使我对评估结果 ([0]) 使用零索引,我也会遇到同样的错误。
我的查询有什么问题?
这里的问题是您将下标 [1]
应用于 XPath.Evaluate
的结果,我相信这将返回 Name
节点。但是,您是在代码中而不是在 XPath 中应用 [1]
下标,因此下标可能是从零开始的,而不是像在 XPath 中那样从 1 开始的,因此出现 Index out of range
错误。
这是一个解决方案 - 只需在 Xpath 中应用下标运算符(它仍然基于 1),然后 select text()
那里
.Evaluate("s2xResponse/s2xData/Name[1]/text()")
您想使用 Evaluate
方法有什么特别的原因吗?我让他使用 XmlDomExtractor
工作,这将允许您从 xml 中提取多个值,例如
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE @inputFile string = "/input/input100.xml";
@input =
EXTRACT Name string
FROM @inputFile
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath : "/s2xResponse",
columnPaths : new SQL.MAP<string, string>{
{ "s2xData/Name", "Name" },
}
);
@output =
SELECT *
FROM @input;