如何修复显示 xa017 的输出？

Question

我从网上抓取了一些数据，它们看起来都不错。但是，一旦我尝试处理数据并对它们的字符串进行一些操作。最后的输出显示，部分字符变成了Unicode码。我该如何解决？

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.fed.cuhk.edu.hk/cri/faculty/prof-lee-kit-bing-icy/')
soup = BeautifulSoup(r.text)

ref= soup.select('h5:-soup-contains("Selected Publications") ~ ol:nth-of-type(1) li')[-1]
publication_dict= {}

#journal page and periodal
if ref.text[ref.text.find(ref.em.text)+len(ref.em.text)+2:-1] == "":
    publication_dict['remamin_information'] = None

else:
    if periodical != None:
        publication_dict['remamin_information'] = (periodical+ref.text[ref.text.find(ref.em.text)+len(ref.em.text):-1])
    else:
        publication_dict['remamin_information'] = (ref.text[ref.text.find(ref.em.text)+len(ref.em.text):-1])

publication_dict

Answer 1

当您打印 list 或 dict 时，Python 使用 debug 表示来显示元素以帮助识别不可打印的字符.如果您实际上 print 字符串，您将看到显示表示：

>>> d = {'remamin_information':',\xa017(2), 69-85.\r\n '}
>>> d     # display the dict.  Elements use debug representation.
>>> d['remamin_information']  # The REPL uses a debug representation
',\xa017(2), 69-85.\r\n '
>>> print(d['remamin_information'])   # the \xa0 is actually a NO-BREAK SPACE
, 17(2), 69-85.                       # and the \r\n becomes a line break

没有什么可以“恢复正常”。只需确保 print() 个字符串以查看它们的显示表示。

如何修复显示 xa017 的输出？

How can I fix output showing xa017?

python

string

unicode

character