Python 的新手,我做错了什么,没有看到 BS4 返回的 <A> 标签(链接)
New to Python, what am I doing wrong and not seeing <A> tag (links) returned with BS4
我是 python 的新手,正在学习。基本上,我试图从存储在下面 html 中的我的电子商务商店产品中提取所有链接。我没有得到返回的结果,但我似乎无法弄清楚为什么不。
<h3 class="two-lines-name">
<a title="APPLE IPOD IPOD A1199 2GB" target="_self" href="/Item/Details/APPLE-IPOD-IPOD-A1199-2GB/d1003297dbe7443c8953750f0c96c62a/400">
APPLE IPOD IPOD A1199 2GB
</a>
</h3>
这是我的python代码
import requests
from bs4 import BeautifulSoup
def my_spider(max_pages):
page = 1
while page <= max_pages:
url = 'www.buya.com/Store/SAM-S-LOCKER/400?page=' + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'h3 class': "two-lines-name"}):
href = link.get('href')
print(href)
page += 1
my_spider(5)
没有数据的结果
Process finished with exit code 0
您的表格有误...您在 my_spider 函数中调用 my_spider...
删除最后一行的表格,它应该可以正常工作。
如果您在实际函数中进行函数调用,那么您实际上并不是 运行 函数,在更正后您将收到一个错误,因为它不是有效的 url传递给请求,最后你的 soup.findAll('a', {'h3 class': "two-lines-name"})
不会找到任何东西:
def my_spider(max_pages):
# use range from 1 to max pages
for i in range(1, max_pages+1):
url = 'http://www.buya.com/Store/SAM-S-LOCKER/400?page={}'.format(i) # http:/?...
source_code = requests.get(url)
plain_text = source_code.content
soup = BeautifulSoup(plain_text)
# you want the h3 tags and to extract the href from the a tags
for link in soup.findAll("h3", {'class': "two-lines-name"}):
href = link.a["href"]
print(href)
my_spider(5) # outside the function
输出:
/Item/Details/12-FT-CHAIN-W-HOOK/cbb1eb65b100459283d15102606208c2/400
/Item/Details/12-INCH-FUSION-SUBWOOFER/534c4d677b2547fb814668b7d061df5d/400
/Item/Details/18-Gold-Chain-14K-Yellow-Gold-2-03g/0aaf2e1e5532461884cb44e786329e80/400
/Item/Details/1820-HANDMADE-STRAIGHT-RAZOR/ed0ba44f98224067b595b726bf01f5ab/400
/Item/Details/2-PAIRS-OF-POCKET-PLIERS-LEATHERMAN/410bcb9e4321426487bee7639b3cb96e/400
/Item/Details/20TH-CENTURY-FOX-Motorcycle-Helmet-RACING-HELMET/e12a75dc7e004e5aa43698c1edf87773/400
/Item/Details/30-CLUBS/a65f1cbff00c4d59ac998dee96eed98b/400
/Item/Details/30-STEEL-CHAINSAW-BLADE/daaca24ede1341c58bb0d0cd32051646/400
/Item/Details/5-GALLON-GLASS-JUG-BREWING-JUG-CHANGE-JAR/dde9b1bfea2a4a23ad93da098ffc674d/400
/Item/Details/5150-SNOWBOARDS-Snowboard-5150-155CM/bcaa07c71c8c4b499a70d34459244f75/400
/Item/Details/6-FT-STEEL-CHAIN/7c24fb1a16ac46e7b9e91f99883652f6/400
/Item/Details/6-5-CUSTOM-HUNTING-KNIFE/ffda1685b2324abe96e3fb7cb6f7f265/400
/Item/Details/95150/39cb080edd474eb6b770b26b40e3dc6b/400
/Item/Details/ACER-Monitor-P201W/ff03d9c33ca747e08e4646d2c3d5143e/400
/Item/Details/ACOUSTIC-RESEARCH-Monitor-Speakers-RESEARCH-AW825/856ff1d8beb9480d893f94d9d49a8642/400
/Item/Details/ACTIVISION-Microsoft-XBOX-360-CALL-OF-DUTY-BLACK-OPS-2-XBOX-360/aef62055b4f14e379f2eea154d162551/400
/Item/Details/ACTIVISION-Video-Game-Accessory-DJ-HERO-95837809/41e3c7f0114e497caf23d8a50fe1f547/400
/Item/Details/ACTIVISION-Video-Game-Accessory-WII-FIT/7daee2a759a54dd7a4e2b6acd37b9c3e/400
/Item/Details/AIMTECH-1911-SCOPE-MOUNT/ac69ae1c40fe4d7db8c53a8ebf842d7d/400
/Item/Details/AIRCO-TIG-WELDING-TUNGSTEN-Arc-Welder-ELECTRODE/70b9b35db0c547c29eb90e02ef60d91a/400
/Item/Details/AIWA-Portable-CD-Player-XP-SP911/75761bfff9a44093be51e4d70410bd85/400
/Item/Details/ALESSI-Gent-s-Wristwatch-KARIM-RASHID/251c3f95173f49078722b301e1d920fe/400
/Item/Details/ALL-AMERICAN-RIDER-Motorcycle-Part-SADDLE-BAGS/87634c0c08d2458ba5b84fa39c9bc3fc/400
/Item/Details/ALL-AMERICAN-RIDER-Motorcycle-Part-SADDLE-BAGS/803f6dfdc9f44326a5a52b63681779ad/400
/Item/Details/ALLY-SKATEBAORD-USED/716cec1588d9408e859718f5961e1ec6/400
/Item/Details/ALPINE-ARCHERY-Bow-FRONTIER/e73dda8034cf4cdb8ebeeebc9683b55d/400
/Item/Details/AMAZON-Tablet-KINDLE-D01100/ea9ac5b291ef487ea6f75ca328e05750/400
/Item/Details/AMAZON-Tablet-KINDLE-FIRE-D01400/ebe0e7001ac744ffa030fd153942a548/400
/Item/Details/APPLE-Computer-Accessories-A1023/6a38f60d2e034dc597043cc42282246e/400
/Item/Details/APPLE-Cell-Phone-Smart-Phone-IPHONE-5C-A1532-AT-T/cc65c513e848475c8000b6e10b6855e5/400
/Item/Details/APPLE-IPOD-IPOD-A1199-2GB/d1003297dbe7443c8953750f0c96c62a/400
...................................................
我是 python 的新手,正在学习。基本上,我试图从存储在下面 html 中的我的电子商务商店产品中提取所有链接。我没有得到返回的结果,但我似乎无法弄清楚为什么不。
<h3 class="two-lines-name">
<a title="APPLE IPOD IPOD A1199 2GB" target="_self" href="/Item/Details/APPLE-IPOD-IPOD-A1199-2GB/d1003297dbe7443c8953750f0c96c62a/400">
APPLE IPOD IPOD A1199 2GB
</a>
</h3>
这是我的python代码
import requests
from bs4 import BeautifulSoup
def my_spider(max_pages):
page = 1
while page <= max_pages:
url = 'www.buya.com/Store/SAM-S-LOCKER/400?page=' + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'h3 class': "two-lines-name"}):
href = link.get('href')
print(href)
page += 1
my_spider(5)
没有数据的结果
Process finished with exit code 0
您的表格有误...您在 my_spider 函数中调用 my_spider... 删除最后一行的表格,它应该可以正常工作。
如果您在实际函数中进行函数调用,那么您实际上并不是 运行 函数,在更正后您将收到一个错误,因为它不是有效的 url传递给请求,最后你的 soup.findAll('a', {'h3 class': "two-lines-name"})
不会找到任何东西:
def my_spider(max_pages):
# use range from 1 to max pages
for i in range(1, max_pages+1):
url = 'http://www.buya.com/Store/SAM-S-LOCKER/400?page={}'.format(i) # http:/?...
source_code = requests.get(url)
plain_text = source_code.content
soup = BeautifulSoup(plain_text)
# you want the h3 tags and to extract the href from the a tags
for link in soup.findAll("h3", {'class': "two-lines-name"}):
href = link.a["href"]
print(href)
my_spider(5) # outside the function
输出:
/Item/Details/12-FT-CHAIN-W-HOOK/cbb1eb65b100459283d15102606208c2/400
/Item/Details/12-INCH-FUSION-SUBWOOFER/534c4d677b2547fb814668b7d061df5d/400
/Item/Details/18-Gold-Chain-14K-Yellow-Gold-2-03g/0aaf2e1e5532461884cb44e786329e80/400
/Item/Details/1820-HANDMADE-STRAIGHT-RAZOR/ed0ba44f98224067b595b726bf01f5ab/400
/Item/Details/2-PAIRS-OF-POCKET-PLIERS-LEATHERMAN/410bcb9e4321426487bee7639b3cb96e/400
/Item/Details/20TH-CENTURY-FOX-Motorcycle-Helmet-RACING-HELMET/e12a75dc7e004e5aa43698c1edf87773/400
/Item/Details/30-CLUBS/a65f1cbff00c4d59ac998dee96eed98b/400
/Item/Details/30-STEEL-CHAINSAW-BLADE/daaca24ede1341c58bb0d0cd32051646/400
/Item/Details/5-GALLON-GLASS-JUG-BREWING-JUG-CHANGE-JAR/dde9b1bfea2a4a23ad93da098ffc674d/400
/Item/Details/5150-SNOWBOARDS-Snowboard-5150-155CM/bcaa07c71c8c4b499a70d34459244f75/400
/Item/Details/6-FT-STEEL-CHAIN/7c24fb1a16ac46e7b9e91f99883652f6/400
/Item/Details/6-5-CUSTOM-HUNTING-KNIFE/ffda1685b2324abe96e3fb7cb6f7f265/400
/Item/Details/95150/39cb080edd474eb6b770b26b40e3dc6b/400
/Item/Details/ACER-Monitor-P201W/ff03d9c33ca747e08e4646d2c3d5143e/400
/Item/Details/ACOUSTIC-RESEARCH-Monitor-Speakers-RESEARCH-AW825/856ff1d8beb9480d893f94d9d49a8642/400
/Item/Details/ACTIVISION-Microsoft-XBOX-360-CALL-OF-DUTY-BLACK-OPS-2-XBOX-360/aef62055b4f14e379f2eea154d162551/400
/Item/Details/ACTIVISION-Video-Game-Accessory-DJ-HERO-95837809/41e3c7f0114e497caf23d8a50fe1f547/400
/Item/Details/ACTIVISION-Video-Game-Accessory-WII-FIT/7daee2a759a54dd7a4e2b6acd37b9c3e/400
/Item/Details/AIMTECH-1911-SCOPE-MOUNT/ac69ae1c40fe4d7db8c53a8ebf842d7d/400
/Item/Details/AIRCO-TIG-WELDING-TUNGSTEN-Arc-Welder-ELECTRODE/70b9b35db0c547c29eb90e02ef60d91a/400
/Item/Details/AIWA-Portable-CD-Player-XP-SP911/75761bfff9a44093be51e4d70410bd85/400
/Item/Details/ALESSI-Gent-s-Wristwatch-KARIM-RASHID/251c3f95173f49078722b301e1d920fe/400
/Item/Details/ALL-AMERICAN-RIDER-Motorcycle-Part-SADDLE-BAGS/87634c0c08d2458ba5b84fa39c9bc3fc/400
/Item/Details/ALL-AMERICAN-RIDER-Motorcycle-Part-SADDLE-BAGS/803f6dfdc9f44326a5a52b63681779ad/400
/Item/Details/ALLY-SKATEBAORD-USED/716cec1588d9408e859718f5961e1ec6/400
/Item/Details/ALPINE-ARCHERY-Bow-FRONTIER/e73dda8034cf4cdb8ebeeebc9683b55d/400
/Item/Details/AMAZON-Tablet-KINDLE-D01100/ea9ac5b291ef487ea6f75ca328e05750/400
/Item/Details/AMAZON-Tablet-KINDLE-FIRE-D01400/ebe0e7001ac744ffa030fd153942a548/400
/Item/Details/APPLE-Computer-Accessories-A1023/6a38f60d2e034dc597043cc42282246e/400
/Item/Details/APPLE-Cell-Phone-Smart-Phone-IPHONE-5C-A1532-AT-T/cc65c513e848475c8000b6e10b6855e5/400
/Item/Details/APPLE-IPOD-IPOD-A1199-2GB/d1003297dbe7443c8953750f0c96c62a/400
...................................................