由于德语元音变音，Python 中的字符串不相等

Question

我尝试与字符串进行比较，它们都包含德语变音符号“ü”。两者看起来完全一样，也没有尾随 \n 或类似的东西。

其中一个位是从 xml-File 中读取的，另一个是从文件系统中读取的。逐个字母比较，与元音变音不同。

变形的变音符号（由两个字母组成，一个正常的 u 和两个上面的点）来自文件系统。我正在使用 macOS High Sierra 和运行 Python 3.7。使用 os.listdir().

读取文件名

我很感激处理这种奇怪行为的建议（摆脱“ü”不是一种选择）。

Answer 1

不是直接比较字符串，而是比较它们的 unicodedata.normalize 结果，给定相同的 form 参数

来自文档：Comparing strings

A second tool is the unicodedata module’s normalize() function that converts strings to one of several normal forms, where letters followed by a combining character are replaced with single characters. normalize() can be used to perform string comparisons that won’t falsely report inequality if two strings use combining characters differently

import unicodedata

def compare_strs(s1, s2):
    def NFD(s):
        return unicodedata.normalize('NFD', s)

    return NFD(s1) == NFD(s2)

由于德语元音变音，Python 中的字符串不相等

Strings in Python not equal due to German Umlaut

python

string

unicode-normalization