使用 beautifulsoup4 和 urllib3 抓取 table html 多页
Scrape table html multipage with beautifulsoup4 and urllib3
请帮帮我,,
我制作的代码仅适用于 1 页,我希望它适用于所有页面。我该怎么办?
import csv
import urllib3
from bs4 import BeautifulSoup
outfile = open("data.csv","w",newline='')
writer = csv.writer(outfile)
for i in range(1,20) :
url = f'http://ciumi.com/cspos/barcode-ritel.php?page={i}'
req = urllib3.PoolManager()
res = req.request('GET', url)
tree = BeautifulSoup(res.data, 'html.parser')
table_tag = tree.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")]
for row_data in table_tag.select("tr")]
for data in tab_data:
writer.writerow(data)
print( res, url, ' '.join(data))
你的代码运行良好,如果你想抓取所有uri并从中获取数据你只需要正确缩进它:
import csv
import urllib3
from bs4 import BeautifulSoup
outfile = open("data.csv","w",newline='')
writer = csv.writer(outfile)
for i in range(1,20) :
url = f'http://ciumi.com/cspos/barcode-ritel.php?page={i}'
req = urllib3.PoolManager()
res = req.request('GET', url)
tree = BeautifulSoup(res.data, 'html.parser')
table_tag = tree.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")] for row_data in table_tag.select("tr")]
for data in tab_data:
writer.writerow(data)
print( res, url, ' '.join(data))
但是你必须清理数据才能得到一个漂亮的 csv 文件
请帮帮我,, 我制作的代码仅适用于 1 页,我希望它适用于所有页面。我该怎么办?
import csv
import urllib3
from bs4 import BeautifulSoup
outfile = open("data.csv","w",newline='')
writer = csv.writer(outfile)
for i in range(1,20) :
url = f'http://ciumi.com/cspos/barcode-ritel.php?page={i}'
req = urllib3.PoolManager()
res = req.request('GET', url)
tree = BeautifulSoup(res.data, 'html.parser')
table_tag = tree.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")]
for row_data in table_tag.select("tr")]
for data in tab_data:
writer.writerow(data)
print( res, url, ' '.join(data))
你的代码运行良好,如果你想抓取所有uri并从中获取数据你只需要正确缩进它:
import csv
import urllib3
from bs4 import BeautifulSoup
outfile = open("data.csv","w",newline='')
writer = csv.writer(outfile)
for i in range(1,20) :
url = f'http://ciumi.com/cspos/barcode-ritel.php?page={i}'
req = urllib3.PoolManager()
res = req.request('GET', url)
tree = BeautifulSoup(res.data, 'html.parser')
table_tag = tree.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")] for row_data in table_tag.select("tr")]
for data in tab_data:
writer.writerow(data)
print( res, url, ' '.join(data))
但是你必须清理数据才能得到一个漂亮的 csv 文件