我试图在抓取 HTML table 后将数据写入 csv 文件
I am trying to write data into csv file after scraping the HTML table
from bs4 import BeautifulSoup
import urllib2
from lxml.html import fromstring
import re
import csv
wiki = "http://en.wikipedia.org/wiki/List_of_Test_cricket_records"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
csv_out = open("mycsv.csv",'wb')
mywriter = csv.writer(csv_out)
def parse_rows(rows):
results = []
for row in rows:
tableheaders = row.findall('th')
if table_headers:
results.append(headers.get_text() for headers in table_headers])
table_data = row.find_all('td')
if table_data:
results.append([data.gettext() for data in table_data])
return results
# Get table
try:
table = soup.find_all('table')[1]
except AttributeError as e:
print 'No tables found, exiting'
# return 1
# Get rows
try:
rows = table.find_all('tr')
except AttributeError as e:
print 'No table rows found, exiting'
#return 1
table_data = parse_rows(rows)
# Print data
for i in table_data:
print '\t'.join(i)
mywriter.writerow(i)
csv_out.close()
UnicodeEncodeError Traceback(最后一次调用)
在 ()
---> 51 mywriter.writerow(d1)
UnicodeEncodeError: 'ascii' 编解码器无法对位置 0 中的字符 u'\xa0' 进行编码:序号不在范围内 (128)
我确实在 ipython 笔记本上获得了数据,但我无法弄清楚 csv 文件何时被写入。
可能是什么错误??请帮助
这是在 python 中写入 csv 的已知问题。你可以看到一个解决方案here。在你的情况下,这一切都归结为写作:
mywriter.writerow([s.encode("utf-8") for s in d1])
或者您可以使用 unicodecsv 库来避免这个技巧
from bs4 import BeautifulSoup
import urllib2
from lxml.html import fromstring
import re
import csv
wiki = "http://en.wikipedia.org/wiki/List_of_Test_cricket_records"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
csv_out = open("mycsv.csv",'wb')
mywriter = csv.writer(csv_out)
def parse_rows(rows):
results = []
for row in rows:
tableheaders = row.findall('th')
if table_headers:
results.append(headers.get_text() for headers in table_headers])
table_data = row.find_all('td')
if table_data:
results.append([data.gettext() for data in table_data])
return results
# Get table
try:
table = soup.find_all('table')[1]
except AttributeError as e:
print 'No tables found, exiting'
# return 1
# Get rows
try:
rows = table.find_all('tr')
except AttributeError as e:
print 'No table rows found, exiting'
#return 1
table_data = parse_rows(rows)
# Print data
for i in table_data:
print '\t'.join(i)
mywriter.writerow(i) csv_out.close()
UnicodeEncodeError Traceback(最后一次调用) 在 ()
---> 51 mywriter.writerow(d1)
UnicodeEncodeError: 'ascii' 编解码器无法对位置 0 中的字符 u'\xa0' 进行编码:序号不在范围内 (128)
我确实在 ipython 笔记本上获得了数据,但我无法弄清楚 csv 文件何时被写入。
可能是什么错误??请帮助
这是在 python 中写入 csv 的已知问题。你可以看到一个解决方案here。在你的情况下,这一切都归结为写作:
mywriter.writerow([s.encode("utf-8") for s in d1])
或者您可以使用 unicodecsv 库来避免这个技巧