kanjidic 2 xpath 用于在 Python 上获取 nanori

kanjidic 2 xpath for getting nanori on Python

我目前正在开发一个使用 kanjidic2 xml 文件 (http://nihongo.monash.edu/kanjidic2/index.html) 的 django 项目。我正在使用 xml.etree.ElementTree 映射 xml 信息。但是,我在使用 关卡时遇到了困难。这是 kanjidic2 上的条目示例:

<character id="9">
<literal>&#36898;</literal>
<codepoint>
<cp_value cp_type="ucs">9022</cp_value>
<cp_value cp_type="jis208">16-9</cp_value>
</codepoint>
<radical>
<rad_value rad_type="classical">162</rad_value>
</radical>
<misc>
<grade>9</grade>
<stroke_count>10</stroke_count>
<stroke_count>9</stroke_count>
<freq>2116</freq>
</misc>
<dic_number>
<dic_ref dr_type="nelson_c">4694</dic_ref>
<dic_ref dr_type="nelson_n">6054</dic_ref>
<dic_ref dr_type="halpern_kkd">4002</dic_ref>
<dic_ref dr_type="halpern_kkld_2ed">2774</dic_ref>
<dic_ref dr_type="heisig">2417</dic_ref>
<dic_ref dr_type="heisig6">2497</dic_ref>
<dic_ref dr_type="oneill_names">1516</dic_ref>
<dic_ref dr_type="moro" m_vol="11" m_page="0075">38901X</dic_ref>
</dic_number>
<query_code>
<q_code qc_type="skip">3-3-7</q_code>
<q_code qc_type="sh_desc">2q7.15</q_code>
<q_code qc_type="four_corner">3730.4</q_code>
<q_code qc_type="deroo">2555</q_code>
<q_code qc_type="skip" skip_misclass="stroke_diff">3-4-7</q_code>
</query_code>
<reading_meaning>
<rmgroup>
<reading r_type="pinyin">feng2</reading>
<reading r_type="korean_r">bong</reading>
<reading r_type="korean_h">&#48393;</reading>
<reading r_type="ja_on">&#12507;&#12454;</reading>
<reading r_type="ja_kun">&#12354;.&#12358;</reading>
<reading r_type="ja_kun">&#12416;&#12363;.&#12360;&#12427;</reading>
<meaning>meeting</meaning>
<meaning>tryst</meaning>
<meaning>date</meaning>
<meaning>rendezvous</meaning>
<meaning m_lang="es">encuentro</meaning>
<meaning m_lang="es">cita</meaning>
<meaning m_lang="es">encuentro casual</meaning>
<meaning m_lang="es">encontrarse</meaning>
<meaning m_lang="es">reunirse</meaning>
<meaning m_lang="es">citarse</meaning>
<meaning m_lang="es">verse por casualidad</meaning>
</rmgroup>
<nanori>&#12354;&#12356;</nanori>
<nanori>&#12362;&#12358;</nanori>
</reading_meaning>
</character>

我可以使用以下代码将其他关卡数据输入 python 词典:

for i in character:
 if i.tag =='dic_number':
        dictionariesDict = {}
        dictionaries = root.find(".//character[@id='"+id+"']//dic_number")
        for dictionary in dictionaries:
            dictionariesDict[dictionary.get('dr_type')] = dictionary.text

但是,当涉及到reading_meaning标签时,我当然不知道如何在一本字典中获取nanori标签,在另一个字典中获取r_type="ja_on"属性,在另一个字典中获取reading r_type="ja_kun"以及另一种语言的含义(最好是每种语言一本字典)。 我尝试了所有类型的 xpath,当我打印 root.find 时,我得到了标签,但是当我循环创建字典时,我得到的只是空字典。

提前感谢您的帮助和耐心等待。

此 xpath 将立即获取 reading[@r_type="ja_kun"]、第二个 meaning 元素和所有 nanori 个元素
(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)

xmllint --xpath '(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)' test.xml | sed -e 's/></>\n</g'
<reading r_type="ja_kun">&#x3042;.&#x3046;</reading>
<reading r_type="ja_kun">&#x3080;&#x304B;.&#x3048;&#x308B;</reading>
<meaning>tryst</meaning>
<nanori>&#x3042;&#x3044;</nanori>
<nanori>&#x304A;&#x3046;</nanori>

在 bash 和 python

上测试
>>> from lxml import etree
>>> doc = etree.parse('test.xml')
>>> doc.xpath('(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)')
>>> arr = doc.xpath('(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)')
>>> for e in arr:
...     print(e.text)
... 
あ.う
むか.える
tryst
あい
おう

使用 xml.etree.ElementTree 你应该一个一个地尝试“OR”中的 xpath 部分