Python LXML- 检查变量值是否具有非 ASCII 值的方法，如果是 return unicode 值

Question

我正在尝试使用 LXML 在 python 中创建 xml。来自外部数据源的变量值用于在我的 xml 文件中输入值。如果变量的值包含非 ASCII 字符，如 € ，则结果为

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters.

问题：我想要 python 中的一个方法来检查变量中的值是否包含非 ASCII 值，如果是，return 该变量对应的 unicode 值，以便我可以使用我的 xml 也一样吗？我不是在寻找 input_string = u'string €'。正如我所说，变量从外部数据源获取其值。请帮忙

Answer 1

您似乎在寻找这个：
(假设Python2.7，输入数据<type 'str'>)

# function that converts input_string from 'str' to 'unicode'
# only if input_string contains non-ASCII bytes 

def decode_if_no_ascii(input_string):

    try:
        input_string.decode('ascii')
    except UnicodeDecodeError:
        input_string = input_string.decode('utf-8') # 'utf-8' should match the encoding of input_string,
                                                    # it could be 'latin_1' or 'cp1252' in a particular case            
    return input_string

我们来测试一下函数：

# 1. ASCII str
input_string = 'string' 
input_string = decode_if_no_ascii(input_string)
print type(input_string), repr(input_string), input_string
# <type 'str'> 'string' string  
# ==> still 'str', no changes 

# 2. non-ASCII str
input_string = 'string €'
input_string = decode_if_no_ascii(input_string)
print type(input_string), repr(input_string), input_string
# <type 'unicode'> u'string \u20ac' string € 
# ==> converted to 'unicode'

这是您要找的吗？

Python LXML- 检查变量值是否具有非 ASCII 值的方法，如果是 return unicode 值

Python LXML- Method to check if the variable value has non ASCII value, if yes return unicode value

python

xml

unicode

lxml

python-unicode