How to fix "UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 3656: ordinal not in range(128)" error in Python
How to fix "UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 3656: ordinal not in range(128)" error in Python
我正在编写代码,使用 Beautifulsoup 从学校网站抓取学生时间表。问题是我一直有这个 UnicodeError: 'ascii' codec can't encode character u'\xa0' in position 3656: ordinal not in range(128) 结果,我无法解决它。
import urllib2
from bs4 import BeautifulSoup
import os
def make_soup(url):
thepage = urllib2.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
timetabledatasaved = ""
soup = make_soup("http://timetable.ait.ie/reporting/textspreadsheet;student+set;id;AL%5FKSWFT%5FR%5F5%0D%0A?t"
"=student+set+textspreadsheet&days=1-5&weeks=21-32&periods="
"3-20&template=student+set+textspreadsheet")
for record in soup.find_all('tr'):
timetabledata = ""
print record
print '--------------------'
for data in record('td'):
timetabledata = timetabledata + "," + data.text
if len(timetabledata) != 0:
timetabledatasaved = timetabledatasaved + "\n" + timetabledata[1:]
#print timetabledatasaved
header = "Activity, Module, Type, Start, End, Duration, Weeks, Room, Staff, Student Groups"
file = open(os.path.expanduser("timetable.csv"), "wb")
file.write(bytes(header).encode("utf-8", errors="ignore"))
file.write(bytes(timetabledatasaved).encode("utf-8", errors="ignore"))
我使用的是 Utf-8,但在抓取时间表后仍然出现此错误。再一次,我意识到我的代码似乎甚至可以抓取页面中的 javascript,但我只希望它打印出相关的时间表数据并将其保存为 .csv 文件。
Traceback (most recent call last):
File "/Users/tobenna/PycharmProjects/final_project/venv/timetable_scrape.py", line 30, in <module>
file.write(bytes(timetabledatasaved).encode("utf-8", errors="ignore"))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 3656: ordinal not in range(128)
Process finished with exit code 1
bytes
in Python 2 是 str
的同义词,因此通过对您的值调用 bytes()
,您将它们编码为 ASCII,这不能处理像 '\xa0'
这样的字符。直接对值进行编码:
file.write(header.encode("utf-8", errors="ignore"))
file.write(timetabledatasaved.encode("utf-8", errors="ignore"))
我正在编写代码,使用 Beautifulsoup 从学校网站抓取学生时间表。问题是我一直有这个 UnicodeError: 'ascii' codec can't encode character u'\xa0' in position 3656: ordinal not in range(128) 结果,我无法解决它。
import urllib2
from bs4 import BeautifulSoup
import os
def make_soup(url):
thepage = urllib2.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
timetabledatasaved = ""
soup = make_soup("http://timetable.ait.ie/reporting/textspreadsheet;student+set;id;AL%5FKSWFT%5FR%5F5%0D%0A?t"
"=student+set+textspreadsheet&days=1-5&weeks=21-32&periods="
"3-20&template=student+set+textspreadsheet")
for record in soup.find_all('tr'):
timetabledata = ""
print record
print '--------------------'
for data in record('td'):
timetabledata = timetabledata + "," + data.text
if len(timetabledata) != 0:
timetabledatasaved = timetabledatasaved + "\n" + timetabledata[1:]
#print timetabledatasaved
header = "Activity, Module, Type, Start, End, Duration, Weeks, Room, Staff, Student Groups"
file = open(os.path.expanduser("timetable.csv"), "wb")
file.write(bytes(header).encode("utf-8", errors="ignore"))
file.write(bytes(timetabledatasaved).encode("utf-8", errors="ignore"))
我使用的是 Utf-8,但在抓取时间表后仍然出现此错误。再一次,我意识到我的代码似乎甚至可以抓取页面中的 javascript,但我只希望它打印出相关的时间表数据并将其保存为 .csv 文件。
Traceback (most recent call last):
File "/Users/tobenna/PycharmProjects/final_project/venv/timetable_scrape.py", line 30, in <module>
file.write(bytes(timetabledatasaved).encode("utf-8", errors="ignore"))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 3656: ordinal not in range(128)
Process finished with exit code 1
bytes
in Python 2 是 str
的同义词,因此通过对您的值调用 bytes()
,您将它们编码为 ASCII,这不能处理像 '\xa0'
这样的字符。直接对值进行编码:
file.write(header.encode("utf-8", errors="ignore"))
file.write(timetabledatasaved.encode("utf-8", errors="ignore"))