将 HTML 标签解析为 XML
Parsing HTML tags into XML
我正在尝试解析嵌入在下面 HTML 文件中的 XML。以下是其中一个标签的详细信息:
DOM<tr class="iris_table_row">
<td style=" width:37.50%; text-align:left; " class="ta_10"><span class="ta_10">Tangible assets</span></td>
<td style=" width:2.50%; text-align:right; " class="ta_10"><span class="ta_10">2</span></td>
<td style=" width:30.00%; text-align:right; " class="ta_61"><ix:nonFraction contextRef="cfwd_31_03_2014" name="ns5:TangibleFixedAssets" unitRef="GBP" decimals="0" format="ixt2:numdotdecimal" scale="0" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">7,956</ix:nonFraction></td>
<td style=" width:1.25%; " class="ta_61" />
<td style=" width:26.25%; text-align:right; " class="ta_60"><ix:nonFraction contextRef="cfwd_31_03_2013" name="ns5:TangibleFixedAssets" unitRef="GBP" decimals="0" format="ixt2:numdotdecimal" scale="0" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">5,402</ix:nonFraction></td>
<td style=" width:1.25%; " class="ta_60" />
<td style=" width:1.25%; " class="ta_10" />
</tr>
我曾尝试在 java 中使用 DOM 解析器来执行此操作,但它无法识别 XML 标签。
下面代码中db.parse(fXmlFile)的值为"null".
File fXmlFile = new File("Prod223_1254_04903825_20140331 copy.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setNamespaceAware(true);
dbf.setIgnoringComments(false);
dbf.setIgnoringElementContentWhitespace(false);
dbf.setExpandEntityReferences(false);
DocumentBuilder db = dbf.newDocumentBuilder();
System.out.println(db.parse(fXmlFile));
如何将所有标签和信息放入java?理想情况下,我可以将它们加载到一个 bean 中。
这是我尝试解析的文件类型的示例。
<?xml version="1.0" encoding="utf-8"?><html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL" xmlns:ixt="http://www.xbrl.org/inlineXBRL/transformation/2010-04-20" xmlns:ixt2="http://www.xbrl.org/inlineXBRL/transformation/2011-07-31" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:xbrldi="http://xbrl.org/2006/xbrldi" xmlns:xl="http://www.xbrl.org/2003/XLink" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:iris="http://www.iris.co.uk/ixbrl" xmlns:ns0="http://www.xbrl.org/uk/gaap/core-full/2009-09-01" xmlns:ns5="http://www.xbrl.org/uk/gaap/core/2009-09-01" xmlns:ns6="http://www.xbrl.org/uk/reports/direp/2009-09-01" xmlns:ns7="http://www.xbrl.org/uk/cd/business/2009-09-01" xmlns:ns8="http://www.xbrl.org/uk/all/types/2009-09-01" xmlns:ns9="http://xbrl.org/2005/xbrldt" xmlns:ns10="http://www.xbrl.org/uk/all/common/2009-09-01" xmlns:ns11="http://www.xbrl.org/2006/ref" xmlns:ns12="http://www.xbrl.org/uk/cd/countries/2009-09-01" xmlns:ns13="http://www.xbrl.org/uk/all/ref/2009-09-01" xmlns:ns14="http://www.xbrl.org/uk/cd/currencies/2009-09-01" xmlns:ns15="http://www.xbrl.org/uk/cd/exchanges/2009-09-01" xmlns:ns16="http://www.xbrl.org/uk/cd/languages/2009-09-01" xmlns:ns17="http://www.xbrl.org/2004/ref" xmlns:ns18="http://www.xbrl.org/uk/all/gaap-ref/2009-09-01" xmlns:ns19="http://www.xbrl.org/uk/reports/aurep/2009-09-01" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:ns20="http://www.govtalk.gov.uk/uk/fr/tax/full-gaap-dpl/2013-10-01" xmlns:ns21="http://www.govtalk.gov.uk/uk/fr/tax/dpl-gaap-main/2013-10-01" xmlns:ns22="http://www.govtalk.gov.uk/uk/fr/tax/dpl-gaap/2013-10-01" xmlns:ns23="http://www.govtalk.gov.uk/uk/fr/tax/dpl-core/2013-10-01">
<head>
<meta name="PostingEntryNumber" content="4" />
<meta name="PeriodRecordNumber" content="2341" />
<meta content="application/xhtml+xml; charset=UTF-8" http-equiv="Content-Type" />
<meta name="description" content="iXBRL report production" />
<meta name="Mode" content="CH" />
<meta http-equiv="X-UA-Compatible" content="IE=8" />
<title>Shortt Orthopaedics Limited - Limited company - abbreviated - 11.6</title>
<style type="text/css">
@media print
{
hr { display:none; }
.portraitpage
{
min-height:273mm;
max-width:170mm;
}
.landscapepage
{
min-height:170mm;
max-width:273mm;
}
}
@media screen
{
.portraitpage
{
max-width:170mm;
min-height:273mm;
margin:12mm 20mm 12mm 20mm;
}
.landscapepage
{
max-width:273mm;
min-height:170mm;
margin:12mm 20mm 12mm 20mm;
}
}
body{ margin:0px; font-size:1.3em; }
td{ padding:0px; }
div.portraitpage{ page-break-after:always; position:relative; }
div.landscapepage{ page-break-after:always; position:relative; }
div.header{ position:relative; }
div.footer{ left:0px; right:0px; bottom:0px; text-align:center; position:absolute; }
div.container{ position:relative; }
div.maintext{ width:100.00%; position:relative; }
div.tagged_blob{ width:100.00%; position:relative; }
table.iris_table{ width:100.00%; border-collapse:collapse; }
table.iris_table_header{ width:100.00%; border-collapse:collapse; }
table.iris_table_footer{ width:100.00%; border-collapse:collapse; }
div.hr.iris_hr{ width:100.00%; }
td.total_single{ border-top:thin solid black; }
td.total_double{ border-top:double black; }
.ta_10{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_11{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_12{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_13{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_20{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_21{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_22{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_23{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_30{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_31{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_32{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_33{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_40{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_41{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_42{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_43{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_50{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_51{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_52{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_53{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_60{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_61{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_62{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_63{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_70{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_71{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_72{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_73{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_80{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_81{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_82{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_83{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_90{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_91{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_92{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_93{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_100{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_101{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_102{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_103{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_110{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_111{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_112{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_113{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_120{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_121{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_122{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_123{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_130{ color:rgb(0, 0, 0); font-family:"Courier New"; font-size:13px; font-weight:400; }
.ta_131{ color:rgb(0, 0, 0); font-family:"Courier New"; font-size:13px; font-weight:700; }
.ta_132{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Courier New"; font-size:13px; font-weight:700; }
.ta_133{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Courier New"; font-size:13px; font-weight:400; }
.ta_140{ color:rgb(0, 0, 0); font-family:"Arial"; font-size:13px; font-weight:400; }
.ta_141{ color:rgb(0, 0, 0); font-family:"Arial"; font-size:13px; font-weight:400; }
.ta_142{ color:rgb(0, 0, 0); font-family:"Arial"; font-size:13px; font-weight:400; }
.ta_143{ color:rgb(0, 0, 0); font-family:"Arial"; font-size:13px; font-weight:400; }
</style>
</head>
<body xml:lang="en">
<div style="display:none">
<ix:header>
<ix:hidden>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:NameAuthor" order="1" tupleRef="XBRLDocumentAuthorGrouping_Group45" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL"></ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:DescriptionOrTitleAuthor" order="2" tupleRef="XBRLDocumentAuthorGrouping_Group45" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL"></ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:UKCompaniesHouseRegisteredNumber" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">07189486</ix:nonNumeric>
<ix:nonNumeric contextRef="CountriesHypercube_FY_31_03_2014_Set1" name="ns7:CountryFormationOrIncorporation" format="ixt2:nocontent" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL" />
<ix:nonNumeric contextRef="CurrenciesHypercube_FY_31_03_2014_Set2" name="ns7:PrincipalCurrencyUsedInBusinessReport" format="ixt2:nocontent" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL" />
<ix:nonNumeric contextRef="EntityOfficersHypercube_FY_31_03_2014_Set3" name="ns5:NameDirectorSigningAccounts" format="ixt2:nocontent" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL" />
<ix:nonNumeric contextRef="cfwd_31_03_2014" name="ns7:StartDateForPeriodCoveredByReport" format="ixt2:datedaymonthyear" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">1.4.13</ix:nonNumeric>
<ix:nonNumeric contextRef="cfwd_31_03_2014" name="ns7:EndDateForPeriodCoveredByReport" format="ixt2:datedaymonthyear" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">31.3.14</ix:nonNumeric>
<ix:nonNumeric contextRef="cfwd_31_03_2014" name="ns7:BalanceSheetDate" format="ixt2:datedaymonthyear" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">31.3.14</ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:EntityAccountsType" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">Company accounts</ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:LegalFormOfEntity" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">Private Limited Company</ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:DescriptionPeriodCoveredByReport" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">FY</ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:EntityTrading" format="ixt2:booleantrue" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">true</ix:nonNumeric>
[Whosebug 限制正文]
我认为您需要分两步走。
- 使用 HTML 解析器获取有问题的嵌入式 XML
- ...然后在内容上使用 DOM 解析器
HTML 并不总是 XML 兼容(除非你使用的 XHTML 已经变得不那么流行了)。浏览器会让很多东西漏掉,比如缺少标签、单引号和双引号、没有值的属性等,这可能是您的网站无法解析的原因。
有很多可用。
根据文档,DTD validation always takes place,即使您告诉它不要!
您想做的是创建一个新的 DTD,将您的名称空间添加到标准 XHTML DTD; W3 站点 discusses how to acheive this,他们给出的示例是针对 MathML 的:
First, define a content model module that instantiates the MathML DTD and connects it to the content model:
<!-- File: mathml-model.mod -->
<!ENTITY % XHTML1-math
PUBLIC "-//W3C//DTD MathML 2.0//EN"
"http://www.w3.org/TR/MathML2/dtd/mathml2.dtd" >
%XHTML1-math;
<!ENTITY % Inlspecial.extra
"%a.qname; | %img.qname; | %object.qname; | %map.qname;
| %Mathml.Math.qname;" >
Next, define a DTD driver that identifies our new content model module as the content model for the DTD, and hands off processing to the XHTML 1.1 driver (for example):
<!-- File: xhtml-mathml.dtd -->
<!ENTITY % xhtml-model.mod
SYSTEM "mathml-model.mod" >
<!ENTITY % xhtml11.dtd
PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" >
%xhtml11.dtd;
我正在尝试解析嵌入在下面 HTML 文件中的 XML。以下是其中一个标签的详细信息:
DOM<tr class="iris_table_row">
<td style=" width:37.50%; text-align:left; " class="ta_10"><span class="ta_10">Tangible assets</span></td>
<td style=" width:2.50%; text-align:right; " class="ta_10"><span class="ta_10">2</span></td>
<td style=" width:30.00%; text-align:right; " class="ta_61"><ix:nonFraction contextRef="cfwd_31_03_2014" name="ns5:TangibleFixedAssets" unitRef="GBP" decimals="0" format="ixt2:numdotdecimal" scale="0" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">7,956</ix:nonFraction></td>
<td style=" width:1.25%; " class="ta_61" />
<td style=" width:26.25%; text-align:right; " class="ta_60"><ix:nonFraction contextRef="cfwd_31_03_2013" name="ns5:TangibleFixedAssets" unitRef="GBP" decimals="0" format="ixt2:numdotdecimal" scale="0" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">5,402</ix:nonFraction></td>
<td style=" width:1.25%; " class="ta_60" />
<td style=" width:1.25%; " class="ta_10" />
</tr>
我曾尝试在 java 中使用 DOM 解析器来执行此操作,但它无法识别 XML 标签。
下面代码中db.parse(fXmlFile)的值为"null".
File fXmlFile = new File("Prod223_1254_04903825_20140331 copy.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setNamespaceAware(true);
dbf.setIgnoringComments(false);
dbf.setIgnoringElementContentWhitespace(false);
dbf.setExpandEntityReferences(false);
DocumentBuilder db = dbf.newDocumentBuilder();
System.out.println(db.parse(fXmlFile));
如何将所有标签和信息放入java?理想情况下,我可以将它们加载到一个 bean 中。
这是我尝试解析的文件类型的示例。
<?xml version="1.0" encoding="utf-8"?><html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL" xmlns:ixt="http://www.xbrl.org/inlineXBRL/transformation/2010-04-20" xmlns:ixt2="http://www.xbrl.org/inlineXBRL/transformation/2011-07-31" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:xbrldi="http://xbrl.org/2006/xbrldi" xmlns:xl="http://www.xbrl.org/2003/XLink" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:iris="http://www.iris.co.uk/ixbrl" xmlns:ns0="http://www.xbrl.org/uk/gaap/core-full/2009-09-01" xmlns:ns5="http://www.xbrl.org/uk/gaap/core/2009-09-01" xmlns:ns6="http://www.xbrl.org/uk/reports/direp/2009-09-01" xmlns:ns7="http://www.xbrl.org/uk/cd/business/2009-09-01" xmlns:ns8="http://www.xbrl.org/uk/all/types/2009-09-01" xmlns:ns9="http://xbrl.org/2005/xbrldt" xmlns:ns10="http://www.xbrl.org/uk/all/common/2009-09-01" xmlns:ns11="http://www.xbrl.org/2006/ref" xmlns:ns12="http://www.xbrl.org/uk/cd/countries/2009-09-01" xmlns:ns13="http://www.xbrl.org/uk/all/ref/2009-09-01" xmlns:ns14="http://www.xbrl.org/uk/cd/currencies/2009-09-01" xmlns:ns15="http://www.xbrl.org/uk/cd/exchanges/2009-09-01" xmlns:ns16="http://www.xbrl.org/uk/cd/languages/2009-09-01" xmlns:ns17="http://www.xbrl.org/2004/ref" xmlns:ns18="http://www.xbrl.org/uk/all/gaap-ref/2009-09-01" xmlns:ns19="http://www.xbrl.org/uk/reports/aurep/2009-09-01" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:ns20="http://www.govtalk.gov.uk/uk/fr/tax/full-gaap-dpl/2013-10-01" xmlns:ns21="http://www.govtalk.gov.uk/uk/fr/tax/dpl-gaap-main/2013-10-01" xmlns:ns22="http://www.govtalk.gov.uk/uk/fr/tax/dpl-gaap/2013-10-01" xmlns:ns23="http://www.govtalk.gov.uk/uk/fr/tax/dpl-core/2013-10-01">
<head>
<meta name="PostingEntryNumber" content="4" />
<meta name="PeriodRecordNumber" content="2341" />
<meta content="application/xhtml+xml; charset=UTF-8" http-equiv="Content-Type" />
<meta name="description" content="iXBRL report production" />
<meta name="Mode" content="CH" />
<meta http-equiv="X-UA-Compatible" content="IE=8" />
<title>Shortt Orthopaedics Limited - Limited company - abbreviated - 11.6</title>
<style type="text/css">
@media print
{
hr { display:none; }
.portraitpage
{
min-height:273mm;
max-width:170mm;
}
.landscapepage
{
min-height:170mm;
max-width:273mm;
}
}
@media screen
{
.portraitpage
{
max-width:170mm;
min-height:273mm;
margin:12mm 20mm 12mm 20mm;
}
.landscapepage
{
max-width:273mm;
min-height:170mm;
margin:12mm 20mm 12mm 20mm;
}
}
body{ margin:0px; font-size:1.3em; }
td{ padding:0px; }
div.portraitpage{ page-break-after:always; position:relative; }
div.landscapepage{ page-break-after:always; position:relative; }
div.header{ position:relative; }
div.footer{ left:0px; right:0px; bottom:0px; text-align:center; position:absolute; }
div.container{ position:relative; }
div.maintext{ width:100.00%; position:relative; }
div.tagged_blob{ width:100.00%; position:relative; }
table.iris_table{ width:100.00%; border-collapse:collapse; }
table.iris_table_header{ width:100.00%; border-collapse:collapse; }
table.iris_table_footer{ width:100.00%; border-collapse:collapse; }
div.hr.iris_hr{ width:100.00%; }
td.total_single{ border-top:thin solid black; }
td.total_double{ border-top:double black; }
.ta_10{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_11{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_12{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_13{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_20{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_21{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_22{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_23{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_30{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_31{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_32{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_33{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_40{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_41{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_42{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_43{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_50{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_51{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_52{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_53{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_60{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_61{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_62{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_63{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_70{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_71{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_72{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_73{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_80{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_81{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_82{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_83{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_90{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_91{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_92{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_93{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_100{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_101{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_102{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_103{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_110{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_111{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_112{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_113{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_120{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_121{ color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_122{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:700; }
.ta_123{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Times New Roman"; font-size:13px; font-weight:400; }
.ta_130{ color:rgb(0, 0, 0); font-family:"Courier New"; font-size:13px; font-weight:400; }
.ta_131{ color:rgb(0, 0, 0); font-family:"Courier New"; font-size:13px; font-weight:700; }
.ta_132{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Courier New"; font-size:13px; font-weight:700; }
.ta_133{ text-decoration:underline; color:rgb(0, 0, 0); font-family:"Courier New"; font-size:13px; font-weight:400; }
.ta_140{ color:rgb(0, 0, 0); font-family:"Arial"; font-size:13px; font-weight:400; }
.ta_141{ color:rgb(0, 0, 0); font-family:"Arial"; font-size:13px; font-weight:400; }
.ta_142{ color:rgb(0, 0, 0); font-family:"Arial"; font-size:13px; font-weight:400; }
.ta_143{ color:rgb(0, 0, 0); font-family:"Arial"; font-size:13px; font-weight:400; }
</style>
</head>
<body xml:lang="en">
<div style="display:none">
<ix:header>
<ix:hidden>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:NameAuthor" order="1" tupleRef="XBRLDocumentAuthorGrouping_Group45" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL"></ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:DescriptionOrTitleAuthor" order="2" tupleRef="XBRLDocumentAuthorGrouping_Group45" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL"></ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:UKCompaniesHouseRegisteredNumber" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">07189486</ix:nonNumeric>
<ix:nonNumeric contextRef="CountriesHypercube_FY_31_03_2014_Set1" name="ns7:CountryFormationOrIncorporation" format="ixt2:nocontent" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL" />
<ix:nonNumeric contextRef="CurrenciesHypercube_FY_31_03_2014_Set2" name="ns7:PrincipalCurrencyUsedInBusinessReport" format="ixt2:nocontent" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL" />
<ix:nonNumeric contextRef="EntityOfficersHypercube_FY_31_03_2014_Set3" name="ns5:NameDirectorSigningAccounts" format="ixt2:nocontent" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL" />
<ix:nonNumeric contextRef="cfwd_31_03_2014" name="ns7:StartDateForPeriodCoveredByReport" format="ixt2:datedaymonthyear" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">1.4.13</ix:nonNumeric>
<ix:nonNumeric contextRef="cfwd_31_03_2014" name="ns7:EndDateForPeriodCoveredByReport" format="ixt2:datedaymonthyear" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">31.3.14</ix:nonNumeric>
<ix:nonNumeric contextRef="cfwd_31_03_2014" name="ns7:BalanceSheetDate" format="ixt2:datedaymonthyear" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">31.3.14</ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:EntityAccountsType" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">Company accounts</ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:LegalFormOfEntity" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">Private Limited Company</ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:DescriptionPeriodCoveredByReport" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">FY</ix:nonNumeric>
<ix:nonNumeric contextRef="FY_31_03_2014" name="ns7:EntityTrading" format="ixt2:booleantrue" xmlns:ix="http://www.xbrl.org/2008/inlineXBRL">true</ix:nonNumeric>
[Whosebug 限制正文]
我认为您需要分两步走。
- 使用 HTML 解析器获取有问题的嵌入式 XML
- ...然后在内容上使用 DOM 解析器
HTML 并不总是 XML 兼容(除非你使用的 XHTML 已经变得不那么流行了)。浏览器会让很多东西漏掉,比如缺少标签、单引号和双引号、没有值的属性等,这可能是您的网站无法解析的原因。
有很多可用。
根据文档,DTD validation always takes place,即使您告诉它不要!
您想做的是创建一个新的 DTD,将您的名称空间添加到标准 XHTML DTD; W3 站点 discusses how to acheive this,他们给出的示例是针对 MathML 的:
First, define a content model module that instantiates the MathML DTD and connects it to the content model:
<!-- File: mathml-model.mod -->
<!ENTITY % XHTML1-math
PUBLIC "-//W3C//DTD MathML 2.0//EN"
"http://www.w3.org/TR/MathML2/dtd/mathml2.dtd" >
%XHTML1-math;
<!ENTITY % Inlspecial.extra
"%a.qname; | %img.qname; | %object.qname; | %map.qname;
| %Mathml.Math.qname;" >
Next, define a DTD driver that identifies our new content model module as the content model for the DTD, and hands off processing to the XHTML 1.1 driver (for example):
<!-- File: xhtml-mathml.dtd -->
<!ENTITY % xhtml-model.mod
SYSTEM "mathml-model.mod" >
<!ENTITY % xhtml11.dtd
PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" >
%xhtml11.dtd;