+ 符号在验证发生时从 xml 中删除

+ sign being dropped from xml when validation occurs

我之前在这里问过一个问题

虽然答案适用于问题,但出现了新问题。解析以下 xml

<?xml version="1.0" encoding="UTF-8"?>
<hml xmlns="http://schemas.nmdp.org/spec/hml/1.0.1"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://schemas.nmdp.org/spec/hml/1.0.1  http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd"
 version="1.0.1" >

 <!-- 
  MIRING Element 1.1 requires the inclusion of an hmlid.
  hmlid can be reported in the form of an ISO Object Identifier (OID)
  "root" represents a unique publically registered organization
  "extension" is a unique document id managed by the reporting organization.
 -->

 <hmlid root="2.34.48.32" extension="HML.3245662"/>

 <!-- 
  MIRING Element 1.2 requires the inclusion of a reporting-center.
  reporting-center identifies the organization sending the HML message.
  "reporting-center-id" is a unique identifier of the sender.
  "reporting-center-context" reports the context/naming authority of the identifier.
 -->

 <reporting-center reporting-center-id="567"/>
 <sample id="4555-6677-8">
  <typing gene-family="HLA" date="2015-01-13">

   <!-- 
    MIRING Element 3 requires the inclusion of Genotyping information.
    The Genotype should include all pertinent Loci, as well as a Genotype in a standard format.  
    GLStrings can be included either as plain text, or as a reference to a publicly
    available service, such as GL Service (gl.nmdp.org)
   -->

   <allele-assignment date="2015-07-28" allele-db="IMGT/HLA" allele-version="3.17.0">
    <haploid locus="HLA-A" method="DNA" type="02:20:01"/>
    <glstring>
     HLA-A*02:20:01
    </glstring>
   </allele-assignment>
   <typing-method>

   <!-- 
    MIRING Element 6 requires platform documentation.  This could be a peer-reviewed publication,
    or an identifier of a procedure on a publicly available resource, such as NCBI GTR
   -->

    <sbt-ngs locus="HLA-A"
     test-id="HLA-A.Test.1234"
     test-id-source="AcmeGenLabs">
     <raw-reads uri="rawreads/read1.fastq.gz"
      availability="public"
      format="fastq"
      paired="1"
      pooled="1"
      adapter-trimmed="1"
      quality-trimmed="0"/>
    </sbt-ngs>
   </typing-method>
   <consensus-sequence date="2015-01-13">

    <!-- 
     MIRING Element 2 requires the inclusion of Reference Context.
     The location and identifiers of the reference sequence should be specified. 
     start and end attributes are 0-based, and refer to positions on the reference sequence.
    --> 

    <reference-database availability="public" curated="true">
     <reference-sequence
      name="HLA-A reference"
      id="Ref111"
      start="945000"
      end="946000"
      accession="GL000123.4"
      uri="http://AcmeGenReference/RefDB/GL000123.4"/>
    </reference-database>

    <!-- 
     MIRING Element 4 requires the inclusion of a consensus sequence.
     The start and end positions are 0-based, and refer to positions on the reference sequence (reference-sequence-id)
     Multiple consensus-sequence-block elements can be included sequentially.
    -->

    <consensus-sequence-block reference-sequence-id="Ref111"
     start="945532"
     end="945832"
     strand="+"
     phase-set="1"
     expected-copy-number="1"
     continuity="true"
     description="HLA-A Consensus Sequence 4.5.67">

     <!-- 
      A sequence can be reported as plain text, or as a pointer to an external reference,
      or as variants from a reference sequence.
     -->

     <sequence>
      CCCAGTTCTCACTCCCATTGGGTGTCGGGTTTCCAGAGAAGCCAATCAGTGTCGTCGCGGTCGCTGTTCTAAAGCCCGCACGCACCCACCGGGACTCAGATTCTCCCCAGACGCCGAGGATGGCCGTCATGGCGCCCCGAACCCTCCTCCTGCTACTCTCGGGGGCCCTGGCCCTGACCCAGACCTGGGCGGGTGAGTGCGGGGTCGGGAGGGAAACCGCCTCTGCGGGGAGAAGCAAGGGGCCCTCCTGGCGGGGGCGCAGGACCGGGGGAGCCGCGCCGGGACGAGGGTCGGGCAGGT
     </sequence>

     <!-- 
      MIRING Element 5 requires the inclusion of any relevant sequence polymorphisms.  
      These represent variants from the reference sequence.
      start and end attributes are 0-based, and refer to positions on the reference sequence.
      You can see this variant at positions 10 - 15 on the sequence. (945542 - 945532 = 10)
     -->

     <variant id="0"
      reference-bases="GTCATG"
      alternate-bases="ACTCCC"
      start="945542"
      end="945548"
      filter="pass"
      quality-score="95">

      <!-- 
       The functional effects of variants can be reported using variant-effect.  
       They should use Sequence Ontology (SO) variant effect terms.
      -->

      <variant-effect term="missense_variant"/>
     </variant>
    </consensus-sequence-block>
   </consensus-sequence>
  </typing>
 </sample>

 <!-- 
  Multiple samples can be included in a single message.  
  Each sample should have it's own reference-database(s) even if they are identical to other samples' references. 
 -->

 <sample id="4555-6677-9">
  <typing gene-family="HLA" date="2015-01-13">
   <allele-assignment date="2015-07-28" allele-db="IMGT/HLA" allele-version="3.17.0">
    <haploid locus="HLA-A" method="DNA" type="02:20:01"/>
    <glstring>
     HLA-A*02:01:01:01
    </glstring>
   </allele-assignment>
   <typing-method>
    <sbt-ngs locus="HLA-A"
     test-id="HLA-A.Test.1234"
     test-id-source="AcmeGenLabs">
     <raw-reads uri="rawreads/read2.fastq.gz"
      availability="public"
      format="fastq"
      paired="1"
      pooled="1"
      adapter-trimmed="1"
      quality-trimmed="0"/>
    </sbt-ngs>
   </typing-method>
   <consensus-sequence date="2015-01-13">
    <reference-database availability="public" curated="true">
     <reference-sequence
      name="HLA-A reference"
      id="Ref112"
      start="945000"
      end="946000"
      accession="GL000123.4"
      uri="http://AcmeGenReference/RefDB/GL000123.4"/>
    </reference-database>
    <consensus-sequence-block 
     reference-sequence-id="Ref112"
     start="945532"
     end="945832"
     strand="+"
     phase-set="1"
     expected-copy-number="1"
     continuity="true"
     description="HLA-A Consensus Sequence 4.5.89">
     <sequence>
      CCCAGTTCTCGTCATGATTGGGTGTCGGGTTTCCAGAGAAGCCAATCAGTGTCGTCGCGGTCGCTGTTCTAAAGCCCGCACGCACCCACCGGGACTCAGATTCTCCCCAGACGCCGAGGATGGCCGTCATGGCGCCCCGAACCCTCCTCCTGCTACTCTCGGGGGCCCTGGCCCTGACCCAGACCTGGGCGGGTGAGTGCGGGGTCGGGAGGGAAACCGCCTCTGCGGGGAGAAGCAAGGGGCCCTCCTGGCGGGGGCGCAGGACCGGGGGAGCCGCGCCGGGACGAGGGTCGGGCAGGT
     </sequence>
    </consensus-sequence-block>
   </consensus-sequence>
  </typing>
 </sample>

</hml>

这是为验证器提供的样本,所以我知道它有效。但是,当我通过 restful POST 代码传递它时:

@POST
    @Path("/Validate")
    @Produces("application/xml")
    public String validate(@FormParam("xml") String xml)
    {
        System.out.println(xml);
        try {
            Client client = Client.create();

            WebResource webResource = client.resource("http://miring.b12x.org/validator/ValidateMiring/");


                                                      // POST method

                                                      ClientResponse response = webResource.accept("application/xml").post(ClientResponse.class,"xml="+xml);
                                                      // check response status code
                                                      if (response.getStatus() != 200) {
                                                          throw new RuntimeException("Failed : HTTP error code : " + response.getStatus());
                                                      }

                                                      // display response
                                                      String output = response.getEntity(String.class);
                                                      System.out.println("Output from Server .... ");
                                                      System.out.println(output + "\n");
                                                        return output;
                                                      } catch (Exception e) {
                                                          e.printStackTrace();
                                                      }

        return "Oops";
    }

除 Strand="+" 外,一切都很好地通过,它出于某种原因丢弃了 + 并收到错误消息 The value '' of attribute 'strand' on element 'consensus-sequence-block' is not就其...而言有效'

我用所有链枚举 +、-、-1、1 尝试了它,除了 + 之外,它们都有效。

使用 WEB UI (miring.b12x.org) 效果很好。

使用 SAX 进行解析是否有可能导致 + 被删除或某个枚举被删除的任何原因?

谢谢

编辑:这是收到的输出:

Output from Server .... 
<?xml version="1.0" encoding="UTF-8"?>
<miring-report xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               timestamp="07/19/2016 15:07:31"
               xsi:noNamespaceSchemaLocation="http://schemas.nmdp.org/spec/miringreport/1.0/miringreport.xsd">
   <hml-compliant>reject</hml-compliant>
   <miring-compliant>reject</miring-compliant>
   <hmlid extension="HML.3245662" root="2.34.48.32"/>
   <samples compliant-sample-count="4"
            noncompliant-sample-count="0"
            sample-count="2">
      <sample hml-compliant="true" id="4555-6677-8" miring-compliant="true"/>
      <sample hml-compliant="true" id="4555-6677-9" miring-compliant="true"/>
   </samples>
   <fatal-validation-errors>
      <miring-result miring-rule-id="reject" severity="fatal">
         <description>[cvc-attribute.3:, The, value, ', ', of, attribute, 'strand', on, element, 'consensus-sequence-block', is, not, valid, with, respect, to, its, type,, 'null'.]</description>
         <solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
      </miring-result>
      <miring-result miring-rule-id="reject" severity="fatal">
         <description>[cvc-attribute.3:, The, value, ', ', of, attribute, 'strand', on, element, 'consensus-sequence-block', is, not, valid, with, respect, to, its, type,, 'null'.]</description>
         <solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
      </miring-result>
      <miring-result miring-rule-id="reject" severity="fatal">
         <description>[cvc-enumeration-valid:, Value, ', ', is, not, facet-valid, with, respect, to, enumeration, '[-1,, 1,, +,, -]'., It, must, be, a, value, from, the, enumeration.]</description>
         <solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
      </miring-result>
      <miring-result miring-rule-id="reject" severity="fatal">
         <description>[cvc-enumeration-valid:, Value, ', ', is, not, facet-valid, with, respect, to, enumeration, '[-1,, 1,, +,, -]'., It, must, be, a, value, from, the, enumeration.]</description>
         <solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
      </miring-result>
   </fatal-validation-errors>
   <validation-warnings>
      <miring-result miring-rule-id="1.2.b" severity="warning">
         <description>The node reporting-center is missing a reporting-center-context attribute.</description>
         <solution>Please add a reporting-center-context attribute to the reporting-center node. You can use reporting-center-context to specify the naming authority of the reporting center identifier.  Reporting-center-context is not explicitly required.</solution>
         <xpath>/hml[1]/reporting-center[1]</xpath>
      </miring-result>
   </validation-warnings>
</miring-report>

你没有设置你的WebResource的type,我不知道请求的默认Content-Type是什么,但我怀疑是application/x-www-form-urlencoded,这意味着 + 被视为 space。如果是这种情况,将 "xml="+xml 更改为 "xml=" + URLEncoder.encode(xml, "UTF-8") 可能会解决问题。

application/x-www-form-urlencoded 格式是 HTML 表单提交的默认格式,因为 described in the HTML 4.01 specification. The the documentation for the URLEncoder class 也描述了这种格式。

在该格式中,一个 + 字符表示一个 space,因此 strand 属性包含一个 space。除了 XML 1.0 规范的 the Attribute-Value Normalization section 声明:

If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters …

因此,单个 space 然后被规范化为空字符串(当所有前导和尾随 space 被删除时)。空字符串 strand='' 不符合您引用的 XML 架构 http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd .

URLEncoder.encode 转义所有“保留”字符,包括 +,作为 percent-escapes,然后将 space 转义为 +。服务器期望这种格式(几乎肯定是因为 HTTP 请求中存在 Content-Type: application/x-www-form-urlencoded header),并将 + 和 percent-escapes 解码回原始 XML.