kanjidic 2 xpath 用于在 Python 上获取 nanori
kanjidic 2 xpath for getting nanori on Python
我目前正在开发一个使用 kanjidic2 xml 文件 (http://nihongo.monash.edu/kanjidic2/index.html) 的 django 项目。我正在使用 xml.etree.ElementTree 映射 xml 信息。但是,我在使用 关卡时遇到了困难。这是 kanjidic2 上的条目示例:
<character id="9">
<literal>逢</literal>
<codepoint>
<cp_value cp_type="ucs">9022</cp_value>
<cp_value cp_type="jis208">16-9</cp_value>
</codepoint>
<radical>
<rad_value rad_type="classical">162</rad_value>
</radical>
<misc>
<grade>9</grade>
<stroke_count>10</stroke_count>
<stroke_count>9</stroke_count>
<freq>2116</freq>
</misc>
<dic_number>
<dic_ref dr_type="nelson_c">4694</dic_ref>
<dic_ref dr_type="nelson_n">6054</dic_ref>
<dic_ref dr_type="halpern_kkd">4002</dic_ref>
<dic_ref dr_type="halpern_kkld_2ed">2774</dic_ref>
<dic_ref dr_type="heisig">2417</dic_ref>
<dic_ref dr_type="heisig6">2497</dic_ref>
<dic_ref dr_type="oneill_names">1516</dic_ref>
<dic_ref dr_type="moro" m_vol="11" m_page="0075">38901X</dic_ref>
</dic_number>
<query_code>
<q_code qc_type="skip">3-3-7</q_code>
<q_code qc_type="sh_desc">2q7.15</q_code>
<q_code qc_type="four_corner">3730.4</q_code>
<q_code qc_type="deroo">2555</q_code>
<q_code qc_type="skip" skip_misclass="stroke_diff">3-4-7</q_code>
</query_code>
<reading_meaning>
<rmgroup>
<reading r_type="pinyin">feng2</reading>
<reading r_type="korean_r">bong</reading>
<reading r_type="korean_h">봉</reading>
<reading r_type="ja_on">ホウ</reading>
<reading r_type="ja_kun">あ.う</reading>
<reading r_type="ja_kun">むか.える</reading>
<meaning>meeting</meaning>
<meaning>tryst</meaning>
<meaning>date</meaning>
<meaning>rendezvous</meaning>
<meaning m_lang="es">encuentro</meaning>
<meaning m_lang="es">cita</meaning>
<meaning m_lang="es">encuentro casual</meaning>
<meaning m_lang="es">encontrarse</meaning>
<meaning m_lang="es">reunirse</meaning>
<meaning m_lang="es">citarse</meaning>
<meaning m_lang="es">verse por casualidad</meaning>
</rmgroup>
<nanori>あい</nanori>
<nanori>おう</nanori>
</reading_meaning>
</character>
我可以使用以下代码将其他关卡数据输入 python 词典:
for i in character:
if i.tag =='dic_number':
dictionariesDict = {}
dictionaries = root.find(".//character[@id='"+id+"']//dic_number")
for dictionary in dictionaries:
dictionariesDict[dictionary.get('dr_type')] = dictionary.text
但是,当涉及到reading_meaning
标签时,我当然不知道如何在一本字典中获取nanori标签,在另一个字典中获取r_type="ja_on"
属性,在另一个字典中获取reading r_type="ja_kun"
以及另一种语言的含义(最好是每种语言一本字典)。
我尝试了所有类型的 xpath,当我打印 root.find 时,我得到了标签,但是当我循环创建字典时,我得到的只是空字典。
提前感谢您的帮助和耐心等待。
此 xpath 将立即获取 reading[@r_type="ja_kun"]
、第二个 meaning
元素和所有 nanori
个元素
(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)
xmllint --xpath '(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)' test.xml | sed -e 's/></>\n</g'
<reading r_type="ja_kun">あ.う</reading>
<reading r_type="ja_kun">むか.える</reading>
<meaning>tryst</meaning>
<nanori>あい</nanori>
<nanori>おう</nanori>
在 bash 和 python
上测试
>>> from lxml import etree
>>> doc = etree.parse('test.xml')
>>> doc.xpath('(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)')
>>> arr = doc.xpath('(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)')
>>> for e in arr:
... print(e.text)
...
あ.う
むか.える
tryst
あい
おう
使用 xml.etree.ElementTree
你应该一个一个地尝试“OR”中的 xpath 部分
我目前正在开发一个使用 kanjidic2 xml 文件 (http://nihongo.monash.edu/kanjidic2/index.html) 的 django 项目。我正在使用 xml.etree.ElementTree 映射 xml 信息。但是,我在使用
<character id="9">
<literal>逢</literal>
<codepoint>
<cp_value cp_type="ucs">9022</cp_value>
<cp_value cp_type="jis208">16-9</cp_value>
</codepoint>
<radical>
<rad_value rad_type="classical">162</rad_value>
</radical>
<misc>
<grade>9</grade>
<stroke_count>10</stroke_count>
<stroke_count>9</stroke_count>
<freq>2116</freq>
</misc>
<dic_number>
<dic_ref dr_type="nelson_c">4694</dic_ref>
<dic_ref dr_type="nelson_n">6054</dic_ref>
<dic_ref dr_type="halpern_kkd">4002</dic_ref>
<dic_ref dr_type="halpern_kkld_2ed">2774</dic_ref>
<dic_ref dr_type="heisig">2417</dic_ref>
<dic_ref dr_type="heisig6">2497</dic_ref>
<dic_ref dr_type="oneill_names">1516</dic_ref>
<dic_ref dr_type="moro" m_vol="11" m_page="0075">38901X</dic_ref>
</dic_number>
<query_code>
<q_code qc_type="skip">3-3-7</q_code>
<q_code qc_type="sh_desc">2q7.15</q_code>
<q_code qc_type="four_corner">3730.4</q_code>
<q_code qc_type="deroo">2555</q_code>
<q_code qc_type="skip" skip_misclass="stroke_diff">3-4-7</q_code>
</query_code>
<reading_meaning>
<rmgroup>
<reading r_type="pinyin">feng2</reading>
<reading r_type="korean_r">bong</reading>
<reading r_type="korean_h">봉</reading>
<reading r_type="ja_on">ホウ</reading>
<reading r_type="ja_kun">あ.う</reading>
<reading r_type="ja_kun">むか.える</reading>
<meaning>meeting</meaning>
<meaning>tryst</meaning>
<meaning>date</meaning>
<meaning>rendezvous</meaning>
<meaning m_lang="es">encuentro</meaning>
<meaning m_lang="es">cita</meaning>
<meaning m_lang="es">encuentro casual</meaning>
<meaning m_lang="es">encontrarse</meaning>
<meaning m_lang="es">reunirse</meaning>
<meaning m_lang="es">citarse</meaning>
<meaning m_lang="es">verse por casualidad</meaning>
</rmgroup>
<nanori>あい</nanori>
<nanori>おう</nanori>
</reading_meaning>
</character>
我可以使用以下代码将其他关卡数据输入 python 词典:
for i in character:
if i.tag =='dic_number':
dictionariesDict = {}
dictionaries = root.find(".//character[@id='"+id+"']//dic_number")
for dictionary in dictionaries:
dictionariesDict[dictionary.get('dr_type')] = dictionary.text
但是,当涉及到reading_meaning
标签时,我当然不知道如何在一本字典中获取nanori标签,在另一个字典中获取r_type="ja_on"
属性,在另一个字典中获取reading r_type="ja_kun"
以及另一种语言的含义(最好是每种语言一本字典)。
我尝试了所有类型的 xpath,当我打印 root.find 时,我得到了标签,但是当我循环创建字典时,我得到的只是空字典。
提前感谢您的帮助和耐心等待。
此 xpath 将立即获取 reading[@r_type="ja_kun"]
、第二个 meaning
元素和所有 nanori
个元素
(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)
xmllint --xpath '(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)' test.xml | sed -e 's/></>\n</g'
<reading r_type="ja_kun">あ.う</reading>
<reading r_type="ja_kun">むか.える</reading>
<meaning>tryst</meaning>
<nanori>あい</nanori>
<nanori>おう</nanori>
在 bash 和 python
上测试>>> from lxml import etree
>>> doc = etree.parse('test.xml')
>>> doc.xpath('(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)')
>>> arr = doc.xpath('(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)')
>>> for e in arr:
... print(e.text)
...
あ.う
むか.える
tryst
あい
おう
使用 xml.etree.ElementTree
你应该一个一个地尝试“OR”中的 xpath 部分