Python

Question

Python 有一些很好的库来转换 Unicode 重音字符 to its closest Ascii character, as well as libraries to encode codepoint to its Unicode character.

但是，有哪些选项可以检查字符串是否具有 unicode 代码点或 HTML 转义？例如，这个字符串：

Rialta te Venice&#199

有 &#199，转换为 latin capital letter C。是否有 python 库检测字符串中的 codepoints/escape 并输出 Unicode 等价物？

Answer 1

我不太清楚你在问什么，但这是我最好的尝试：

&#199 是一个 HTML 转义 ，你可以 unescape 像这样：

>>> s = 'Rialta te Venice&#199'
>>> import html
>>> s2 = html.unescape(s); s2
'Rialta te VeniceÇ'

所以完整的解决方案是使用unidecode.unidecode(html.unescape(s))。

Python - 检测字符串中重音 HTML 转义的最佳方法？