.net XML反序列化:大小写异常
.net XML deserialization: uppercase and lowercase exception
我在 .net 中反序列化 XML 时遇到一些问题。这是我得到的错误:
The opening tag 'A' on line 72 position 56 does not match the end tag of 'a'. Line 72, position 118.
可以看到,是同一个标签,只是一个大写,一个小写。我的 XML 有这样的结构:
<?xml version="1.0"?>
<translationfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<translationtext>
<es_text>Spanish text</es_text>
<en_text>English text</en_text>
<developer_comment>Plain text</developer_comment>
</translationtext>
....
</translationfile>
这是我的 vb class
Option Strict Off
Option Explicit On
Imports System.Xml.Serialization
'
'Este código fuente fue generado automáticamente por xsd, Versión=2.0.50727.3038.
'
'''<comentarios/>
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
System.SerializableAttribute(), _
System.Diagnostics.DebuggerStepThroughAttribute(), _
System.ComponentModel.DesignerCategoryAttribute("code"), _
System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True), _
System.Xml.Serialization.XmlRootAttribute([Namespace]:="", IsNullable:=False)> _
Partial Public Class translationfile
Private itemsField As List(Of translationfileTranslationtext)
'''<comentarios/>
<System.Xml.Serialization.XmlElementAttribute("translationtext", _
Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property Items As List(Of translationfileTranslationtext)
Get
Return Me.itemsField
End Get
Set(value As List(Of translationfileTranslationtext))
Me.itemsField = value
End Set
End Property
End Class
'''<comentarios/>
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
System.SerializableAttribute(), _
System.Diagnostics.DebuggerStepThroughAttribute(), _
System.ComponentModel.DesignerCategoryAttribute("code"), _
System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True)> _
Partial Public Class translationfileTranslationtext
Private es_textField As String
Private en_textField As String
Private developer_commentField As String
'''<comentarios/>
<System.Xml.Serialization.XmlElementAttribute _
(Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property es_text() As String
Get
Return Me.es_textField
End Get
Set(value As String)
Me.es_textField = value
End Set
End Property
'''<comentarios/>
<System.Xml.Serialization.XmlElementAttribute( _
Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property en_text() As String
Get
Return Me.en_textField
End Get
Set(value As String)
Me.en_textField = value
End Set
End Property
'''<comentarios/>
<System.Xml.Serialization.XmlElementAttribute( _
Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property developer_comment() As String
Get
Return Me.developer_commentField
End Get
Set(value As String)
Me.developer_commentField = value
End Set
End Property
End Class
问题是两个文本都可能包含 HTML 代码。 XML 是由客户手动生成的,我无法更改这些标签内的文本。他们还可以定义自己的标签,例如 <client27tagname>...</client27tagname>
。例如。这是真实案例:
<translationtext>
<es_text><p>Nombre</P></es_text>
<en_text><p>Name</P></en_text>
<developer_comment>irrelevant text</developer_comment>
</translationtext>
当我尝试反序列化 XML 文件时,出现了之前的错误,因为 <p>
是小写,而 </P>
是大写。如何在不更改文本的情况下正确反序列化?是否有可能将这些标签内的所有文本都视为简单字符串?
这是我用于反序列化的代码:
Dim stream As New IO.StreamReader(path)
Dim ser As New Xml.Serialization.XmlSerializer(GetType(translationfile))
Dim myperfil As New translationfile
myperfil = CType(ser.Deserialize(stream), translationfile) 'This line throws the exception
stream.Close()
更新
按照 Olivier 的建议进行更改后。这是我的 class:
Option Strict Off
Option Explicit On
Imports System.Xml.Serialization
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
System.SerializableAttribute(), _
System.Diagnostics.DebuggerStepThroughAttribute(), _
System.ComponentModel.DesignerCategoryAttribute("code"), _
System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True), _
System.Xml.Serialization.XmlRootAttribute([Namespace]:="", IsNullable:=False)> _
Partial Public Class translationfile
Private itemsField As List(Of translationfileTranslationtext)
<System.Xml.Serialization.XmlElementAttribute("translationtext", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property Items As List(Of translationfileTranslationtext)
Get
Return Me.itemsField
End Get
Set(value As List(Of translationfileTranslationtext))
Me.itemsField = value
End Set
End Property
End Class
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
System.SerializableAttribute(), _
System.Diagnostics.DebuggerStepThroughAttribute(), _
System.ComponentModel.DesignerCategoryAttribute("code"), _
System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True)> _
Partial Public Class translationfileTranslationtext
Private es_textField As String
Private en_textField As String
Private developer_commentField As String
<XmlIgnore()>
Public Property es_text() As String
Get
Return Me.es_textField
End Get
Set(value As String)
Me.es_textField = value
End Set
End Property
<XmlElement(ElementName:="es_text", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property es_HtmlText() As String
Get
Return System.Web.HttpUtility.HtmlEncode(Me.es_textField)
End Get
Set(value As String)
Me.es_textField = HttpUtility.HtmlDecode(value)
End Set
End Property
<XmlIgnore()>
Public Property en_text() As String
Get
Return Me.en_textField
End Get
Set(value As String)
Me.en_textField = value
End Set
End Property
<XmlElement(ElementName:="en_text", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property en_HtmlText() As String
Get
Return System.Web.HttpUtility.HtmlEncode(Me.en_textField)
End Get
Set(value As String)
Me.en_textField = HttpUtility.HtmlDecode(value)
End Set
End Property
<System.Xml.Serialization.XmlElementAttribute(Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property developer_comment() As String
Get
Return Me.developer_commentField
End Get
Set(value As String)
Me.developer_commentField = value
End Set
End Property
End Class
使用 HttpUtility.HtmlEncode
对您的文本进行编码,并使用 HttpUtility.HtmlDecode
对其进行解码。
您可以为此创建一个额外的 属性 并从序列化中排除原始 属性。
'Exclude the original property from serialization
<XmlIgnore()> _
Public Property en_text() As String
Get
Return Me.en_textField
End Get
Set(value As String)
Me.en_textField = value
End Set
End Property
'Name the encoding/decoding property element like the original property
<XmlElement(ElementName := "en_text", Form:=XmlSchemaForm.Unqualified)> _
Public Property en_HtmlEncodedText() As String
Get
Return HttpUtility.HtmlEncode(Me.en_textField)
End Get
Set(value As String)
Me.en_textField = HttpUtility.HtmlDecode(value)
End Set
End Property
Html 编码会将 "<"
和 ">"
转换为 "<"
和 ">"
,从而使内部标签对 XML 不可见.
更新
Mt 解决方案有效。我现在已经测试过了。您可能已经使用仍然包含 html 纯文本标签 ("<p>Name</P>"
) 的 XML 对其进行了测试。我的代码所做的是将 html 写成 "&lt;p&gt;Name&lt;/P&gt;"
。这就是 HttpUtility.HtmlEncode
所做的。因此,您必须先使用我的方法编写一个 XML 文件。只有这样,阅读才会成功。
这是我的写作测试:
Public Sub WriteTest()
Dim myperfil As New translationfile With {
.Items = New List(Of translationfileTranslationtext) From {
New translationfileTranslationtext With {.en_text = "en test", .es_text = "spanish"},
New translationfileTranslationtext With {.en_text = "<p>Name</P>", .es_text = "<p>Nombre</P>"}
}
}
Dim writer As New IO.StreamWriter(path)
Dim ser As New XmlSerializer(GetType(translationfile))
ser.Serialize(writer, myperfil)
writer.Close()
End Sub
它创建以下 XML:
?xml version="1.0" encoding="utf-8"?>
<translationfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<translationtext>
<es_text>spanish</es_text>
<en_text>en test</en_text>
</translationtext>
<translationtext>
<es_text>&lt;p&gt;Nombre&lt;/P&gt;</es_text>
<en_text>&lt;p&gt;Name&lt;/P&gt;</en_text>
</translationtext>
</translationfile>
这是我的读取测试,抛出 no 异常:
Public Sub ReadTest()
Dim myperfil As translationfile
Dim reader As New IO.StreamReader(path)
Dim ser As New XmlSerializer(GetType(translationfile))
myperfil = CType(ser.Deserialize(reader), translationfile)
reader.Close()
For Each item As translationfileTranslationtext In myperfil.Items
Console.WriteLine("EN = {0}, ES = {1}", item.en_text, item.es_text)
Next
Console.ReadKey()
End Sub
它将此写入控制台:
EN = en test, ES = spanish
EN = <p>Name</P>, ES = <p>Nombre</P>
经过一些测试,我找到了解决方法。
- 我将所有文本作为一个简单的字符串获取
- 我将所有
<
个字符替换为默认字符串:#open_key#
- 我把所有的
#open_key#es_text>
替换成<es_text>
- en_text、developer_coment 等相同...
- 我将结果保存到一个临时文件
- 我反序列化临时文件
- 在做回复之前,我把所有的
#open_key#
替换成<
我在 .net 中反序列化 XML 时遇到一些问题。这是我得到的错误:
The opening tag 'A' on line 72 position 56 does not match the end tag of 'a'. Line 72, position 118.
可以看到,是同一个标签,只是一个大写,一个小写。我的 XML 有这样的结构:
<?xml version="1.0"?>
<translationfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<translationtext>
<es_text>Spanish text</es_text>
<en_text>English text</en_text>
<developer_comment>Plain text</developer_comment>
</translationtext>
....
</translationfile>
这是我的 vb class
Option Strict Off
Option Explicit On
Imports System.Xml.Serialization
'
'Este código fuente fue generado automáticamente por xsd, Versión=2.0.50727.3038.
'
'''<comentarios/>
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
System.SerializableAttribute(), _
System.Diagnostics.DebuggerStepThroughAttribute(), _
System.ComponentModel.DesignerCategoryAttribute("code"), _
System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True), _
System.Xml.Serialization.XmlRootAttribute([Namespace]:="", IsNullable:=False)> _
Partial Public Class translationfile
Private itemsField As List(Of translationfileTranslationtext)
'''<comentarios/>
<System.Xml.Serialization.XmlElementAttribute("translationtext", _
Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property Items As List(Of translationfileTranslationtext)
Get
Return Me.itemsField
End Get
Set(value As List(Of translationfileTranslationtext))
Me.itemsField = value
End Set
End Property
End Class
'''<comentarios/>
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
System.SerializableAttribute(), _
System.Diagnostics.DebuggerStepThroughAttribute(), _
System.ComponentModel.DesignerCategoryAttribute("code"), _
System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True)> _
Partial Public Class translationfileTranslationtext
Private es_textField As String
Private en_textField As String
Private developer_commentField As String
'''<comentarios/>
<System.Xml.Serialization.XmlElementAttribute _
(Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property es_text() As String
Get
Return Me.es_textField
End Get
Set(value As String)
Me.es_textField = value
End Set
End Property
'''<comentarios/>
<System.Xml.Serialization.XmlElementAttribute( _
Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property en_text() As String
Get
Return Me.en_textField
End Get
Set(value As String)
Me.en_textField = value
End Set
End Property
'''<comentarios/>
<System.Xml.Serialization.XmlElementAttribute( _
Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property developer_comment() As String
Get
Return Me.developer_commentField
End Get
Set(value As String)
Me.developer_commentField = value
End Set
End Property
End Class
问题是两个文本都可能包含 HTML 代码。 XML 是由客户手动生成的,我无法更改这些标签内的文本。他们还可以定义自己的标签,例如 <client27tagname>...</client27tagname>
。例如。这是真实案例:
<translationtext>
<es_text><p>Nombre</P></es_text>
<en_text><p>Name</P></en_text>
<developer_comment>irrelevant text</developer_comment>
</translationtext>
当我尝试反序列化 XML 文件时,出现了之前的错误,因为 <p>
是小写,而 </P>
是大写。如何在不更改文本的情况下正确反序列化?是否有可能将这些标签内的所有文本都视为简单字符串?
这是我用于反序列化的代码:
Dim stream As New IO.StreamReader(path)
Dim ser As New Xml.Serialization.XmlSerializer(GetType(translationfile))
Dim myperfil As New translationfile
myperfil = CType(ser.Deserialize(stream), translationfile) 'This line throws the exception
stream.Close()
更新
按照 Olivier 的建议进行更改后。这是我的 class:
Option Strict Off
Option Explicit On
Imports System.Xml.Serialization
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
System.SerializableAttribute(), _
System.Diagnostics.DebuggerStepThroughAttribute(), _
System.ComponentModel.DesignerCategoryAttribute("code"), _
System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True), _
System.Xml.Serialization.XmlRootAttribute([Namespace]:="", IsNullable:=False)> _
Partial Public Class translationfile
Private itemsField As List(Of translationfileTranslationtext)
<System.Xml.Serialization.XmlElementAttribute("translationtext", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property Items As List(Of translationfileTranslationtext)
Get
Return Me.itemsField
End Get
Set(value As List(Of translationfileTranslationtext))
Me.itemsField = value
End Set
End Property
End Class
<System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038"), _
System.SerializableAttribute(), _
System.Diagnostics.DebuggerStepThroughAttribute(), _
System.ComponentModel.DesignerCategoryAttribute("code"), _
System.Xml.Serialization.XmlTypeAttribute(AnonymousType:=True)> _
Partial Public Class translationfileTranslationtext
Private es_textField As String
Private en_textField As String
Private developer_commentField As String
<XmlIgnore()>
Public Property es_text() As String
Get
Return Me.es_textField
End Get
Set(value As String)
Me.es_textField = value
End Set
End Property
<XmlElement(ElementName:="es_text", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property es_HtmlText() As String
Get
Return System.Web.HttpUtility.HtmlEncode(Me.es_textField)
End Get
Set(value As String)
Me.es_textField = HttpUtility.HtmlDecode(value)
End Set
End Property
<XmlIgnore()>
Public Property en_text() As String
Get
Return Me.en_textField
End Get
Set(value As String)
Me.en_textField = value
End Set
End Property
<XmlElement(ElementName:="en_text", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property en_HtmlText() As String
Get
Return System.Web.HttpUtility.HtmlEncode(Me.en_textField)
End Get
Set(value As String)
Me.en_textField = HttpUtility.HtmlDecode(value)
End Set
End Property
<System.Xml.Serialization.XmlElementAttribute(Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property developer_comment() As String
Get
Return Me.developer_commentField
End Get
Set(value As String)
Me.developer_commentField = value
End Set
End Property
End Class
使用 HttpUtility.HtmlEncode
对您的文本进行编码,并使用 HttpUtility.HtmlDecode
对其进行解码。
您可以为此创建一个额外的 属性 并从序列化中排除原始 属性。
'Exclude the original property from serialization
<XmlIgnore()> _
Public Property en_text() As String
Get
Return Me.en_textField
End Get
Set(value As String)
Me.en_textField = value
End Set
End Property
'Name the encoding/decoding property element like the original property
<XmlElement(ElementName := "en_text", Form:=XmlSchemaForm.Unqualified)> _
Public Property en_HtmlEncodedText() As String
Get
Return HttpUtility.HtmlEncode(Me.en_textField)
End Get
Set(value As String)
Me.en_textField = HttpUtility.HtmlDecode(value)
End Set
End Property
Html 编码会将 "<"
和 ">"
转换为 "<"
和 ">"
,从而使内部标签对 XML 不可见.
更新
Mt 解决方案有效。我现在已经测试过了。您可能已经使用仍然包含 html 纯文本标签 ("<p>Name</P>"
) 的 XML 对其进行了测试。我的代码所做的是将 html 写成 "&lt;p&gt;Name&lt;/P&gt;"
。这就是 HttpUtility.HtmlEncode
所做的。因此,您必须先使用我的方法编写一个 XML 文件。只有这样,阅读才会成功。
这是我的写作测试:
Public Sub WriteTest()
Dim myperfil As New translationfile With {
.Items = New List(Of translationfileTranslationtext) From {
New translationfileTranslationtext With {.en_text = "en test", .es_text = "spanish"},
New translationfileTranslationtext With {.en_text = "<p>Name</P>", .es_text = "<p>Nombre</P>"}
}
}
Dim writer As New IO.StreamWriter(path)
Dim ser As New XmlSerializer(GetType(translationfile))
ser.Serialize(writer, myperfil)
writer.Close()
End Sub
它创建以下 XML:
?xml version="1.0" encoding="utf-8"?>
<translationfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<translationtext>
<es_text>spanish</es_text>
<en_text>en test</en_text>
</translationtext>
<translationtext>
<es_text>&lt;p&gt;Nombre&lt;/P&gt;</es_text>
<en_text>&lt;p&gt;Name&lt;/P&gt;</en_text>
</translationtext>
</translationfile>
这是我的读取测试,抛出 no 异常:
Public Sub ReadTest()
Dim myperfil As translationfile
Dim reader As New IO.StreamReader(path)
Dim ser As New XmlSerializer(GetType(translationfile))
myperfil = CType(ser.Deserialize(reader), translationfile)
reader.Close()
For Each item As translationfileTranslationtext In myperfil.Items
Console.WriteLine("EN = {0}, ES = {1}", item.en_text, item.es_text)
Next
Console.ReadKey()
End Sub
它将此写入控制台:
EN = en test, ES = spanish
EN = <p>Name</P>, ES = <p>Nombre</P>
经过一些测试,我找到了解决方法。
- 我将所有文本作为一个简单的字符串获取
- 我将所有
<
个字符替换为默认字符串:#open_key#
- 我把所有的
#open_key#es_text>
替换成<es_text>
- en_text、developer_coment 等相同...
- 我将结果保存到一个临时文件
- 我反序列化临时文件
- 在做回复之前,我把所有的
#open_key#
替换成<