如何修复显示 xa017 的输出?
How can I fix output showing xa017?
我从网上抓取了一些数据,它们看起来都不错。但是,一旦我尝试处理数据并对它们的字符串进行一些操作。最后的输出显示,部分字符变成了Unicode码。我该如何解决?
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.fed.cuhk.edu.hk/cri/faculty/prof-lee-kit-bing-icy/')
soup = BeautifulSoup(r.text)
ref= soup.select('h5:-soup-contains("Selected Publications") ~ ol:nth-of-type(1) li')[-1]
publication_dict= {}
#journal page and periodal
if ref.text[ref.text.find(ref.em.text)+len(ref.em.text)+2:-1] == "":
publication_dict['remamin_information'] = None
else:
if periodical != None:
publication_dict['remamin_information'] = (periodical+ref.text[ref.text.find(ref.em.text)+len(ref.em.text):-1])
else:
publication_dict['remamin_information'] = (ref.text[ref.text.find(ref.em.text)+len(ref.em.text):-1])
publication_dict
当您打印 list
或 dict
时,Python 使用 debug 表示来显示元素以帮助识别不可打印的字符.如果您实际上 print
字符串,您将看到 显示 表示:
>>> d = {'remamin_information':',\xa017(2), 69-85.\r\n '}
>>> d # display the dict. Elements use debug representation.
>>> d['remamin_information'] # The REPL uses a debug representation
',\xa017(2), 69-85.\r\n '
>>> print(d['remamin_information']) # the \xa0 is actually a NO-BREAK SPACE
, 17(2), 69-85. # and the \r\n becomes a line break
没有什么可以“恢复正常”。只需确保 print()
个字符串以查看它们的显示表示。
我从网上抓取了一些数据,它们看起来都不错。但是,一旦我尝试处理数据并对它们的字符串进行一些操作。最后的输出显示,部分字符变成了Unicode码。我该如何解决?
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.fed.cuhk.edu.hk/cri/faculty/prof-lee-kit-bing-icy/')
soup = BeautifulSoup(r.text)
ref= soup.select('h5:-soup-contains("Selected Publications") ~ ol:nth-of-type(1) li')[-1]
publication_dict= {}
#journal page and periodal
if ref.text[ref.text.find(ref.em.text)+len(ref.em.text)+2:-1] == "":
publication_dict['remamin_information'] = None
else:
if periodical != None:
publication_dict['remamin_information'] = (periodical+ref.text[ref.text.find(ref.em.text)+len(ref.em.text):-1])
else:
publication_dict['remamin_information'] = (ref.text[ref.text.find(ref.em.text)+len(ref.em.text):-1])
publication_dict
当您打印 list
或 dict
时,Python 使用 debug 表示来显示元素以帮助识别不可打印的字符.如果您实际上 print
字符串,您将看到 显示 表示:
>>> d = {'remamin_information':',\xa017(2), 69-85.\r\n '}
>>> d # display the dict. Elements use debug representation.
>>> d['remamin_information'] # The REPL uses a debug representation
',\xa017(2), 69-85.\r\n '
>>> print(d['remamin_information']) # the \xa0 is actually a NO-BREAK SPACE
, 17(2), 69-85. # and the \r\n becomes a line break
没有什么可以“恢复正常”。只需确保 print()
个字符串以查看它们的显示表示。