将所有 ascii 符号（字母除外）替换为 Python 中的 HTML 数字

Question

我需要将除字母以外的所有 ascii 符号替换为 HTML 数字 (http://www.ascii.cl/htmlcodes.htm)。从这个 post(Convert HTML entities to Unicode and vice versa)，我可以使用这段代码，但我仍然无法使 *（或许多其他字符）工作。

有什么解决办法？只是简单的替换可能是唯一的解决方案？

>>> from BeautifulSoup import BeautifulStoneSoup as bs
>>> import cgi
>>> cgi.escape("<*>").encode('ascii', 'xmlcharrefreplace')

'&lt;*&gt;'

Answer 1

你的问题有点含糊。我假设 "alphabets" 是指 a-z 及其大写变体中的所有字符。然后你可以使用正则表达式达到想要的结果：

>>> f = lambda s: re.sub(r'([^a-zA-Z])', lambda x: '&#{};'.format(ord(x.group(0))), s)
>>> f("<hi>")
'&#60;hi&#62;'
>>> f("<*>")
'&#60;&#42;&#62;'

请注意，在不了解您的特殊应用程序的情况下，这看起来很奇怪。可能有更好的方法来解决真正的潜在问题。

将所有 ascii 符号（字母除外）替换为 Python 中的 HTML 数字

Replace all ascii symbols (other than alphabets) into HTML number in Python

html

python

ascii