请求不起作用,而 Urllib2 起作用
Requests doesn't work while Urllib2 does
我有一个非常简单的例子,我认为 requests
应该毫无问题地替换 urllib2
。但是,似乎在最简单的情况下,我遗漏了一些东西。谁能告诉我为什么? (我正在尝试为我的女朋友获取 makeup pricing:))
from bs4 import BeautifulSoup
myurl = 'http://www.lancome-usa.com/skincare-cleanse/skincare-cleanse,default,sc.html'
# 1. urllib2
import urllib2
r = urllib2.urlopen(myurl)
soup = BeautifulSoup(r)
print 'MOUSSE RADIANCE' in soup.text.encode('utf-8').upper()
print '.00' in soup.text.encode('utf-8')
print '------------------------------------------------------'
# 2. requests
import requests
r = requests.get(myurl)
soup = BeautifulSoup(r.text)
print 'MOUSSE RADIANCE' in soup.text.encode('utf-8').upper()
print '.00' in soup.text.encode('utf-8')
回复:
True
True
------------------------------------------------------
False
False
最后,我升级了我的 requests
库,同时,我意识到使用 r.text
是行不通的。但是使用 r.text.encode('utf-8')
和 r.content
将构建具有正确内容的汤。
您需要升级您的请求版本,您还可以获得"product-name"
标签中的所有产品:
import requests
r = requests.get(myurl)
soup = BeautifulSoup(r.content)
products = {x.text.strip().lower() for x in soup.find_all("div",{"class":"product-name"})}
print("mousse radiance" in products)
for p in products:
print(p)
True
crème radiance
crème douceur
absolue precious pure
mousse radiance
crème mousse confort
eau fraîche douceur
tonique confort
tonique pure focus
gel radiance
我正在使用 2.5.3:
In [8]: import requests
In [9]: requests.__version__
Out[9]: '2.5.3'
您可以创建产品和价格的字典,这样您的女朋友就可以让您购买她的多种产品:
products = [x.text.strip().lower() for x in soup.find_all("div",{"class":"product-name"})]
prices = [x.text for x in soup.find_all("span",{"class":"product-sales-price"})]
items = dict(zip(products,prices))
print(items.get("mousse radiance","N/A"))
print(items)
.00
{u'cr\xe8me radiance': u'.00 - .00', u'cr\xe8me douceur': u'.00 - .00', u'absolue precious pure': u'.00', u'mousse radiance': u'.00', u'cr\xe8me mousse confort': u'.00', u'eau fra\xeeche douceur': u'.00 - .00', u'tonique confort': u'.00 - .00', u'tonique pure focus': u'.00', u'gel radiance': u'.00'}
我有一个非常简单的例子,我认为 requests
应该毫无问题地替换 urllib2
。但是,似乎在最简单的情况下,我遗漏了一些东西。谁能告诉我为什么? (我正在尝试为我的女朋友获取 makeup pricing:))
from bs4 import BeautifulSoup
myurl = 'http://www.lancome-usa.com/skincare-cleanse/skincare-cleanse,default,sc.html'
# 1. urllib2
import urllib2
r = urllib2.urlopen(myurl)
soup = BeautifulSoup(r)
print 'MOUSSE RADIANCE' in soup.text.encode('utf-8').upper()
print '.00' in soup.text.encode('utf-8')
print '------------------------------------------------------'
# 2. requests
import requests
r = requests.get(myurl)
soup = BeautifulSoup(r.text)
print 'MOUSSE RADIANCE' in soup.text.encode('utf-8').upper()
print '.00' in soup.text.encode('utf-8')
回复:
True
True
------------------------------------------------------
False
False
最后,我升级了我的 requests
库,同时,我意识到使用 r.text
是行不通的。但是使用 r.text.encode('utf-8')
和 r.content
将构建具有正确内容的汤。
您需要升级您的请求版本,您还可以获得"product-name"
标签中的所有产品:
import requests
r = requests.get(myurl)
soup = BeautifulSoup(r.content)
products = {x.text.strip().lower() for x in soup.find_all("div",{"class":"product-name"})}
print("mousse radiance" in products)
for p in products:
print(p)
True
crème radiance
crème douceur
absolue precious pure
mousse radiance
crème mousse confort
eau fraîche douceur
tonique confort
tonique pure focus
gel radiance
我正在使用 2.5.3:
In [8]: import requests
In [9]: requests.__version__
Out[9]: '2.5.3'
您可以创建产品和价格的字典,这样您的女朋友就可以让您购买她的多种产品:
products = [x.text.strip().lower() for x in soup.find_all("div",{"class":"product-name"})]
prices = [x.text for x in soup.find_all("span",{"class":"product-sales-price"})]
items = dict(zip(products,prices))
print(items.get("mousse radiance","N/A"))
print(items)
.00
{u'cr\xe8me radiance': u'.00 - .00', u'cr\xe8me douceur': u'.00 - .00', u'absolue precious pure': u'.00', u'mousse radiance': u'.00', u'cr\xe8me mousse confort': u'.00', u'eau fra\xeeche douceur': u'.00 - .00', u'tonique confort': u'.00 - .00', u'tonique pure focus': u'.00', u'gel radiance': u'.00'}