反序列化时出现意外的节点类型元素错误

Unexpected node type element error when deserialising

我正在尝试解析用 XML 编写的大型日语到英语词典。典型条目如下所示:

<entry>
<ent_seq>1486440</ent_seq>
<k_ele>
<keb>美術</keb>
<ke_pri>ichi1</ke_pri>
<ke_pri>news1</ke_pri>
<ke_pri>nf02</ke_pri>
</k_ele>
<r_ele>
<reb>びじゅつ</reb>
<re_pri>ichi1</re_pri>
<re_pri>news1</re_pri>
<re_pri>nf02</re_pri>
</r_ele>
<sense>
<pos>&n;</pos>
<pos>&adj-no;</pos>
<gloss>art</gloss>
<gloss>fine arts</gloss>
</sense>
<sense>
<gloss xml:lang="dut">kunst</gloss>
<gloss xml:lang="dut">schone kunsten</gloss>
</sense>
<sense>
<gloss xml:lang="fre">art</gloss>
<gloss xml:lang="fre">beaux-arts</gloss>
</sense>
<sense>
<gloss xml:lang="ger">Kunst</gloss>
<gloss xml:lang="ger">die schönen Künste</gloss>
<gloss xml:lang="ger">bildende Kunst</gloss>
</sense>
<sense>
<gloss xml:lang="ger">Produktionsdesign</gloss>
<gloss xml:lang="ger">Szenographie</gloss>
</sense>
<sense>
<gloss xml:lang="hun">művészet</gloss>
<gloss xml:lang="hun">művészeti</gloss>
<gloss xml:lang="hun">művészi</gloss>
<gloss xml:lang="hun">rajzóra</gloss>
<gloss xml:lang="hun">szépművészet</gloss>
</sense>
<sense>
<gloss xml:lang="rus">изящные искусства; искусство</gloss>
<gloss xml:lang="rus">{~{的}} художественный, артистический</gloss>
</sense>
<sense>
<gloss xml:lang="slv">umetnost</gloss>
<gloss xml:lang="slv">likovna umetnost</gloss>
</sense>
<sense>
<gloss xml:lang="spa">bellas artes</gloss>
</sense>
</entry>

我已经根据 中 djv 提供的代码编写了一个反序列化器,它确实将整个字典反序列化为一系列 class 对象。这是我目前得到的代码:

ReadOnly jmdictpath As String = "JMdict"

<XmlRoot>
Public Class JMdict
    <XmlElement("entry")>
    Public Property entrylist As List(Of entry)
End Class

<Serializable()>
Public Class entry
    Public Property ent_seq As Integer
    Public Property k_ele As k_ele
    Public Property r_ele As r_ele
    <XmlElement("sense")>
    Public Property senselist As List(Of sense)
End Class

<Serializable()>
Public Class k_ele
    Public Property keb As String
    Public Property ke_pri As List(Of String)
    Public Property ke_inf As List(Of String)
End Class

<Serializable()>
Public Class r_ele
    Public Property reb As String
    Public Property re_pri As List(Of String)
    Public Property ke_inf As List(Of String)
End Class

<Serializable()>
Public Class sense
    <XmlElement("pos")>
    Public Property pos As List(Of string)
    <XmlElement("gloss")>
    Public Property gloss As List(Of gloss)
End Class

<Serializable()>
Public Class gloss
    <XmlAttribute("xml:lang")>
    Public Property lang As String
    <XmlAttribute("g_type")>
    Public Property g_type As String
    <XmlText>
    Public Property Text As String
    Public Overrides Function ToString() As String
        Return Text
    End Function
End Class

Dim dict As JMdict

Sub Deserialise()
    Dim serialiser As New XmlSerializer(GetType(JMdict))
    Using sr As New StreamReader(jmdictpath)
        dict = CType(serialiser.Deserialize(sr), JMdict)
    End Using
End Sub

但是,当我 运行 代码时,出现以下错误:

System.InvalidOperationException: 'There is an error in XML document (415, 7).'

XmlException: Unexpected node type EntityReference. ReadElementString method can only be called on elements with simple or empty content. Line 415, position 7.

我查看了XML,第415行是这一行:

 <pos>&unc;</pos>

所以解串器在读取 <pos> 标签时遇到问题。所以我尝试了一些东西。

首先,我尝试删除 sense class 中 pos<XMLElement> 标签。这样做意味着没有错误,而且反序列化器根本没有读取任何条目的 pos 的任何数据。

其次,我检查了 Whosebug 并找到了 this related question where OP had the same problem. The accepted answer in this question suggested splitting the data into further classes,所以我也尝试了一下,并创建了一个新的 pos class:

<Serializable()>
Public Class sense
    <XmlElement("pos")>
    Public Property pos As List(Of pos)
    <XmlElement("gloss")>
    Public Property gloss As List(Of gloss)
End Class

<Serializable()>
Public Class pos
    <XmlText>
    Public Property Text As String
    Public Overrides Function ToString() As String
        Return Text
    End Function
End Class

再一次,虽然这没有导致任何错误,但 pos 元素在每个条目中都是空白的。每个 pos 标签只包含一个值——尽管每个 sense 标签可以有多个 pos 标签——所以我认为它不需要自己的 class 对象.无论如何,这个答案并没有解决我的问题,所以我才问这个问题。

我对 XML 反序列化完全陌生,并不真正理解我在深入做什么 - 我试图根据 弄清楚它的机制,但我显然在这里做错了什么。如有任何建议,我们将不胜感激。

您只需要使用 XmlReader 创建 XmlSerializer 并正确配置 XmlReaderSettings. The only thing you need to configure in the settings is the DtdProcessing Property 并将其设置为 DtdProcessing.Parse

Dim settings As XmlReaderSettings = New XmlReaderSettings()
settings.DtdProcessing = DtdProcessing.Parse

Dim xmlPath As String = Path.Combine(Application.StartupPath, "yourfilename.xml")

Dim ser As New XmlSerializer(GetType(JMdict))

Dim JMdictInstance As JMdict
Using rdr As XmlReader = XmlReader.Create(xmlPath, settings)
   JMdictInstance = CType(ser.Deserialize(rdr), JMdict)
End Using