.net XML反序列化:大小写异常

.net XML deserialization: uppercase and lowercase exception

我在 .net 中反序列化 XML 时遇到一些问题。这是我得到的错误:

The opening tag 'A' on line 72 position 56 does not match the end tag of 'a'. Line 72, position 118.

可以看到,是同一个标签,只是一个大写,一个小写。我的 XML 有这样的结构:

<?xml version="1.0"?>
<translationfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _
                 xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <translationtext>
       <es_text>Spanish text</es_text>
       <en_text>English text</en_text>
       <developer_comment>Plain text</developer_comment>
    </translationtext>
    ....
</translationfile>

这是我的 vb class

Option Strict Off
Option Explicit On

Imports System.Xml.Serialization

'
'Este código fuente fue generado automáticamente por xsd, Versión=2.0.50727.3038.
'

'''<comentarios/>
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
 System.SerializableAttribute(), _
 System.Diagnostics.DebuggerStepThroughAttribute(), _
 System.ComponentModel.DesignerCategoryAttribute("code"), _
 System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True), _
 System.Xml.Serialization.XmlRootAttribute([Namespace]:="", IsNullable:=False)> _
Partial Public Class translationfile

    Private itemsField As List(Of translationfileTranslationtext)

    '''<comentarios/>
    <System.Xml.Serialization.XmlElementAttribute("translationtext", _
        Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property Items As List(Of translationfileTranslationtext)
        Get
            Return Me.itemsField
        End Get
        Set(value As List(Of translationfileTranslationtext))
            Me.itemsField = value
        End Set
    End Property
End Class

'''<comentarios/>
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
 System.SerializableAttribute(), _
 System.Diagnostics.DebuggerStepThroughAttribute(), _
 System.ComponentModel.DesignerCategoryAttribute("code"), _
 System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True)> _
Partial Public Class translationfileTranslationtext

    Private es_textField As String

    Private en_textField As String

    Private developer_commentField As String

    '''<comentarios/>
    <System.Xml.Serialization.XmlElementAttribute _
        (Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property es_text() As String
        Get
            Return Me.es_textField
        End Get
        Set(value As String)
            Me.es_textField = value
        End Set
    End Property

    '''<comentarios/>
    <System.Xml.Serialization.XmlElementAttribute( _
        Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property en_text() As String
        Get
            Return Me.en_textField
        End Get
        Set(value As String)
            Me.en_textField = value
        End Set
    End Property

    '''<comentarios/>
    <System.Xml.Serialization.XmlElementAttribute( _
        Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property developer_comment() As String
        Get
            Return Me.developer_commentField
        End Get
        Set(value As String)
            Me.developer_commentField = value
        End Set
    End Property
End Class

问题是两个文本都可能包含 HTML 代码。 XML 是由客户手动生成的,我无法更改这些标签内的文本。他们还可以定义自己的标签,例如 <client27tagname>...</client27tagname>。例如。这是真实案例:

<translationtext>
    <es_text><p>Nombre</P></es_text>
    <en_text><p>Name</P></en_text>
    <developer_comment>irrelevant text</developer_comment>
</translationtext>

当我尝试反序列化 XML 文件时,出现了之前的错误,因为 <p> 是小写,而 </P> 是大写。如何在不更改文本的情况下正确反序列化?是否有可能将这些标签内的所有文本都视为简单字符串?

这是我用于反序列化的代码:

Dim stream As New IO.StreamReader(path)
Dim ser As New Xml.Serialization.XmlSerializer(GetType(translationfile))
Dim myperfil As New translationfile

myperfil = CType(ser.Deserialize(stream), translationfile) 'This line throws the exception
stream.Close()

更新

按照 Olivier 的建议进行更改后。这是我的 class:

Option Strict Off
Option Explicit On

Imports System.Xml.Serialization

<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
 System.SerializableAttribute(), _
 System.Diagnostics.DebuggerStepThroughAttribute(), _
 System.ComponentModel.DesignerCategoryAttribute("code"), _
 System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True), _
 System.Xml.Serialization.XmlRootAttribute([Namespace]:="", IsNullable:=False)> _
Partial Public Class translationfile

    Private itemsField As List(Of translationfileTranslationtext)

    <System.Xml.Serialization.XmlElementAttribute("translationtext", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property Items As List(Of translationfileTranslationtext)
        Get
            Return Me.itemsField
        End Get
        Set(value As List(Of translationfileTranslationtext))
            Me.itemsField = value
        End Set
    End Property
End Class

<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
 System.SerializableAttribute(), _
 System.Diagnostics.DebuggerStepThroughAttribute(), _
 System.ComponentModel.DesignerCategoryAttribute("code"), _
 System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True)> _
Partial Public Class translationfileTranslationtext

    Private es_textField As String

    Private en_textField As String

    Private developer_commentField As String

    <XmlIgnore()>
    Public Property es_text() As String
        Get
            Return Me.es_textField
        End Get
        Set(value As String)
            Me.es_textField = value
        End Set
    End Property

    <XmlElement(ElementName:="es_text", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property es_HtmlText() As String
        Get
            Return System.Web.HttpUtility.HtmlEncode(Me.es_textField)
        End Get
        Set(value As String)
            Me.es_textField = HttpUtility.HtmlDecode(value)
        End Set
    End Property

    <XmlIgnore()>
    Public Property en_text() As String
        Get
            Return Me.en_textField
        End Get
        Set(value As String)
            Me.en_textField = value
        End Set
    End Property

    <XmlElement(ElementName:="en_text", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property en_HtmlText() As String
        Get
            Return System.Web.HttpUtility.HtmlEncode(Me.en_textField)
        End Get
        Set(value As String)
            Me.en_textField = HttpUtility.HtmlDecode(value)
        End Set
    End Property
       <System.Xml.Serialization.XmlElementAttribute(Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
    Public Property developer_comment() As String
        Get
            Return Me.developer_commentField
        End Get
        Set(value As String)
            Me.developer_commentField = value
        End Set
    End Property
End Class

使用 HttpUtility.HtmlEncode 对您的文本进行编码,并使用 HttpUtility.HtmlDecode 对其进行解码。

您可以为此创建一个额外的 属性 并从序列化中排除原始 属性。

'Exclude the original property from serialization
<XmlIgnore()> _
Public Property en_text() As String
    Get
        Return Me.en_textField
    End Get
    Set(value As String)
        Me.en_textField = value
    End Set
End Property

'Name the encoding/decoding property element like the original property
<XmlElement(ElementName := "en_text", Form:=XmlSchemaForm.Unqualified)> _
Public Property en_HtmlEncodedText() As String
    Get
        Return HttpUtility.HtmlEncode(Me.en_textField)
    End Get
    Set(value As String)
        Me.en_textField = HttpUtility.HtmlDecode(value)
    End Set
End Property

Html 编码会将 "<"">" 转换为 "&lt;""&gt;",从而使内部标签对 XML 不可见.


更新

Mt​​ 解决方案有效。我现在已经测试过了。您可能已经使用仍然包含 html 纯文本标签 ("<p>Name</P>") 的 XML 对其进行了测试。我的代码所做的是将 html 写成 "&amp;lt;p&amp;gt;Name&amp;lt;/P&amp;gt;"。这就是 HttpUtility.HtmlEncode 所做的。因此,您必须先使用我的方法编写一个 XML 文件。只有这样,阅读才会成功。

这是我的写作测试:

Public Sub WriteTest()
    Dim myperfil As New translationfile With {
        .Items = New List(Of translationfileTranslationtext) From {
            New translationfileTranslationtext With {.en_text = "en test", .es_text = "spanish"},
            New translationfileTranslationtext With {.en_text = "<p>Name</P>", .es_text = "<p>Nombre</P>"}
        }
    }

    Dim writer As New IO.StreamWriter(path)
    Dim ser As New XmlSerializer(GetType(translationfile))
    ser.Serialize(writer, myperfil)
    writer.Close()
End Sub

它创建以下 XML:

?xml version="1.0" encoding="utf-8"?>
<translationfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <translationtext>
    <es_text>spanish</es_text>
    <en_text>en test</en_text>
  </translationtext>
  <translationtext>
    <es_text>&amp;lt;p&amp;gt;Nombre&amp;lt;/P&amp;gt;</es_text>
    <en_text>&amp;lt;p&amp;gt;Name&amp;lt;/P&amp;gt;</en_text>
  </translationtext>
</translationfile>

这是我的读取测试,抛出 no 异常:

Public Sub ReadTest()
    Dim myperfil As translationfile
    Dim reader As New IO.StreamReader(path)
    Dim ser As New XmlSerializer(GetType(translationfile))

    myperfil = CType(ser.Deserialize(reader), translationfile)
    reader.Close()

    For Each item As translationfileTranslationtext In myperfil.Items
        Console.WriteLine("EN = {0}, ES = {1}", item.en_text, item.es_text)
    Next
    Console.ReadKey()
End Sub

它将此写入控制台:

EN = en test, ES = spanish
EN = <p>Name</P>, ES = <p>Nombre</P>

经过一些测试,我找到了解决方法。

  1. 我将所有文本作为一个简单的字符串获取
  2. 我将所有 < 个字符替换为默认字符串:#open_key#
  3. 我把所有的#open_key#es_text>替换成<es_text>
  4. en_text、developer_coment 等相同...
  5. 我将结果保存到一个临时文件
  6. 我反序列化临时文件
  7. 在做回复之前,我把所有的#open_key#替换成<