Python return 将数据写入文件时出错(Python 2.7)

Question

我正在用 python mini-Dom module.while 解析 XML 文件，将数据写入文件时出现 Unicode Encode Error: 'ASCII' codec can't encode characters in position 0-3: ordinal not in range(128) 这样的错误。但是在命令行上输出完美打印请告诉我解决方案。

我的 XML 文件是：

   <?xml version="1.0"?>
    <Feature>
        <Word Root  ="ਨੌਕਰ-ਚਾਕਰ">
            <info Inflection  ="ਨੌਕਰਾਂ-ਚਾਕਰਾਂ">
        <posinfo gender  ="Masculine" number  ="Plural" case  ="Oblique" />

                </info>
        </Word>
                </Feature>

我的python代码是：

import sys

from xml.dom import minidom

file=open("npu.txt","w+")
doc = minidom.parse("NPU.xml")
word = doc.getElementsByTagName("Word")
for each in word:
    # print "root"+each.getAttribute("Root")
    file.write(each.getAttribute("Root")+"\n")
    hh=each.getElementsByTagName("info")

    for each1 in hh:
        # print "inflection"+each1.getAttribute("Inflection")
        file.write(each1.getAttribute("Inflection")+"\t")

        vv=each1.getElementsByTagName("posinfo")
        for each2 in vv:
            # print each2.getAttribute("gender")
            # print each2.getAttribute("number")
            # print each2.getAttribute("case")
            file.write( each2.getAttribute("gender")+",")
            file.write( each2.getAttribute("number")+",")
            file.write(each2.getAttribute("case"))
        file.write("\n")
    file.write("--------\n")

Answer 1

问题不在于您解析 XML 的方式，这是一个编码问题。

错误是由您的文本编码 (UTF-8) 引起的。您正在尝试将您的文本编写为不包含您正在使用的字符的 ASCII。

尝试使用以下编解码器：

import codecs

file = codecs.open("npu.txt", "w+", "utf-8")
file.write("ਨੌਕਰ-ਚਾਕਰ".decode('utf-8'))
file.close()

编辑：

您也可以将默认编码设置为UTF-8 添加特殊注释 # -*- coding: UTF-8 -*- 在 python 来源的开头。默认编码为 ASCII（7 位）。请注意，Python 标识符仍仅限于 ASCII 字符。

Answer 2

encode data while writing-
#!/usr/bin/env python
# -*- coding: utf-8 -*-
file=open("npu.txt","w+") 
file.write("ਨੌਕਰ-ਚਾਕਰ")

Python return 将数据写入文件时出错(Python 2.7)

Python return error while writing data into file(Python 2.7)

python

xml

minidom