Python bs4模块

Question

import requests
from bs4 import BeautifulSoup

'''
It's a web crawler working in ebay, collecting every single item data
'''

def ebay_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'http://www.ebay.co.uk/sch/Apple-Laptops/111422/i.html?_pgn=' \
              + str(page)
        source_code = requests.get(url)

        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'class': 'vip'}):
            href = 'http://www.ebay.co.uk' + link.get('href')
            title = link.string
    get_single_item_data(href)
    page += 1


def get_single_item_data(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for item_name in soup.findAll('h1', {'id': "itemTitle"}):
        print(item_name.string)

ebay_spider(3)

Blockquote And the error say that : http://imgur.com/403a6N8
I tried to fix it but it seems not to work, so any tips/answers how to fix it?

EDIT: Sorry everyone for faulty title and tag, everything was fixed.

Answer 1

这与请求模块完全无关。正如 Jean-Francois 所说，按照它告诉你的去做，然后继续前进。

soup = BeautifulSoup(plain_text,"html.parser",markup_type=markup_ty‌pe)

Answer 2

当您尝试制作一个 BeautifulSoup 对象时，请改为这样做：

soup = BeautifulSoup(plain_text)

这个：

soup = BeautifulSoup(plain_text, 'html.parser')

注意：您的问题是指 bs4 模块，而不是请求。

Python bs4模块

Python bs4 module

python

module

user-warning

bs4