美丽的汤和宽吻,如何正确解析
Beautiful soup and bottlenose, how to parse correctly
我目前正在尝试从 bottlenose amazon api 请求的响应中提取字符串。
不想造成 Russian hackers to pwn to my webapp, I am trying to use beautiful soup following this small webpage as guide.
我当前的代码:
import bottlenose as BN
import lxml
from bs4 import BeautifulSoup
amazon = BN.Amazon('MyAmznID','MyAmznSK','MyAmznAssTag',Region='UK', Parser=BeautifulSoup)
rank = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")
soup = BeautifulSoup(rank)
print rank
print soup.find('SalesRank').string
这是 bottlenose 的当前输出,如下所示:
<?xml version="1.0" ?><html><body><itemlookupresponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01"><operationrequest><httpheaders><header name="UserAgent" value="Python-urllib/2.7"></header></httpheaders><requestid>53f15ff4-3588-4e63-af6f-279bddc7c243</requestid><arguments><argument name="AWSAccessKeyId" value="################"></argument><argument name="AssociateTag" value="#########-##"></argument><argument name="ItemId" value="0198596790"></argument><argument name="Operation" value="ItemLookup"></argument><argument name="ResponseGroup" value="SalesRank"></argument><argument name="Service" value="AWSECommerceService"></argument><argument name="Timestamp" value="2016-02-04T11:05:48Z"></argument><argument name="Version" value="2011-08-01"></argument><argument name="Signature" value="################+##################="></argument></arguments><requestprocessingtime>0.0234130000000000</requestprocessingtime></operationrequest><items><request><isvalid>True</isvalid><itemlookuprequest><idtype>ASIN</idtype><itemid>0198596790</itemid><responsegroup>SalesRank</responsegroup><variationpage>All</variationpage></itemlookuprequest></request><item><asin>0198596790</asin><salesrank>124435</salesrank></item></items></itemlookupresponse></body></html>
所以瓶鼻部分工作但汤部分给出错误响应:
Traceback (most recent call last):
File "/Users/Fuck/Documents/Amazon/Bottlenose_amzn_prog/test.py", line 12, in <module>
print soup.find(Rank).string
NameError: name 'soup' is not defined
我正在尝试提取 'SalesRank' 标签之间的数字,但失败了。
好的,所以我忽略了在 bottlenose 行中指定解析器的选项。
而只是指定稍后使用 BeautifulSoup 和 xml 解析。
import bottlenose as BN
import lxml
from bs4 import BeautifulSoup
amazon = BN.Amazon('##############','##############','##########',Region='UK')
rank = amazon.ItemLookup(ItemId="specifiedItemId",ResponseGroup="SalesRank")
soup = BeautifulSoup(rank, "xml")
print " "
print soup.SalesRank
我是 Python 的新手用户,所以有时是简单的事情让我着迷。
从the code来看,Bottlenose Parser
选项似乎很简单,以一个函数作为参数。
所以你可以只做一个非常简单的 Python 函数并将它传递给构造函数,这使得你的代码看起来像这样:
import bottlenose as BN
from bs4 import BeautifulSoup
def parse_xml(text):
return BeautifulSoup(text, 'xml')
amazon = BN.Amazon(
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,
AWS_ASSOCIATE_TAG,Region='UK', Parser=parse_xml
)
results = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")
print results.find('SalesRank').string
或者您可以改用 lambda 内联函数:
import bottlenose as BN
from bs4 import BeautifulSoup
amazon = BN.Amazon(
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_ASSOCIATE_TAG,
Region='UK', Parser=lambda text: BeautifulSoup(text, 'xml')
)
results = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")
print results.find('SalesRank').string
我目前正在尝试从 bottlenose amazon api 请求的响应中提取字符串。 不想造成 Russian hackers to pwn to my webapp, I am trying to use beautiful soup following this small webpage as guide.
我当前的代码:
import bottlenose as BN
import lxml
from bs4 import BeautifulSoup
amazon = BN.Amazon('MyAmznID','MyAmznSK','MyAmznAssTag',Region='UK', Parser=BeautifulSoup)
rank = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")
soup = BeautifulSoup(rank)
print rank
print soup.find('SalesRank').string
这是 bottlenose 的当前输出,如下所示:
<?xml version="1.0" ?><html><body><itemlookupresponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01"><operationrequest><httpheaders><header name="UserAgent" value="Python-urllib/2.7"></header></httpheaders><requestid>53f15ff4-3588-4e63-af6f-279bddc7c243</requestid><arguments><argument name="AWSAccessKeyId" value="################"></argument><argument name="AssociateTag" value="#########-##"></argument><argument name="ItemId" value="0198596790"></argument><argument name="Operation" value="ItemLookup"></argument><argument name="ResponseGroup" value="SalesRank"></argument><argument name="Service" value="AWSECommerceService"></argument><argument name="Timestamp" value="2016-02-04T11:05:48Z"></argument><argument name="Version" value="2011-08-01"></argument><argument name="Signature" value="################+##################="></argument></arguments><requestprocessingtime>0.0234130000000000</requestprocessingtime></operationrequest><items><request><isvalid>True</isvalid><itemlookuprequest><idtype>ASIN</idtype><itemid>0198596790</itemid><responsegroup>SalesRank</responsegroup><variationpage>All</variationpage></itemlookuprequest></request><item><asin>0198596790</asin><salesrank>124435</salesrank></item></items></itemlookupresponse></body></html>
所以瓶鼻部分工作但汤部分给出错误响应:
Traceback (most recent call last):
File "/Users/Fuck/Documents/Amazon/Bottlenose_amzn_prog/test.py", line 12, in <module>
print soup.find(Rank).string
NameError: name 'soup' is not defined
我正在尝试提取 'SalesRank' 标签之间的数字,但失败了。
好的,所以我忽略了在 bottlenose 行中指定解析器的选项。 而只是指定稍后使用 BeautifulSoup 和 xml 解析。
import bottlenose as BN
import lxml
from bs4 import BeautifulSoup
amazon = BN.Amazon('##############','##############','##########',Region='UK')
rank = amazon.ItemLookup(ItemId="specifiedItemId",ResponseGroup="SalesRank")
soup = BeautifulSoup(rank, "xml")
print " "
print soup.SalesRank
我是 Python 的新手用户,所以有时是简单的事情让我着迷。
从the code来看,Bottlenose Parser
选项似乎很简单,以一个函数作为参数。
所以你可以只做一个非常简单的 Python 函数并将它传递给构造函数,这使得你的代码看起来像这样:
import bottlenose as BN
from bs4 import BeautifulSoup
def parse_xml(text):
return BeautifulSoup(text, 'xml')
amazon = BN.Amazon(
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,
AWS_ASSOCIATE_TAG,Region='UK', Parser=parse_xml
)
results = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")
print results.find('SalesRank').string
或者您可以改用 lambda 内联函数:
import bottlenose as BN
from bs4 import BeautifulSoup
amazon = BN.Amazon(
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_ASSOCIATE_TAG,
Region='UK', Parser=lambda text: BeautifulSoup(text, 'xml')
)
results = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")
print results.find('SalesRank').string