处理请求时遇到 urlencoding 问题
Facing urlencoding issue while processing request
我在 python 中编写了一个脚本来从网页中抓取一些信息。该站点需要 get
请求方法。我现在面临的问题是,由于 parameters
需要与 url
合并,所以它应该是 urlencoded
。这就是我被困的地方。我无法对其进行正确编码以获得有效响应。我尝试了一下,但它没有带来任何东西
我尝试使用的脚本:
import requests
import urllib.parse
fields ={
'/API/api/v1/Search/Properties/?f':'319 lizzie','ty':'2018','pvty':'2017','pn':'1','st':'9','so':'1','pt':'RP;PP;MH;NR','take':'20','skip':'0','page':'1','pageSize':'20'
}
payload = urllib.parse.quote_plus(fields, safe='', encoding=None, errors=None)
headers={
"User-Agent":"Mozilla/5.0"
}
page = requests.get("http://search.wcad.org/Proxy/APIProxy.ashx?", params=payload, headers=headers)
print(page.json())
上面的URL应该是这样的:
http://search.wcad.org/Proxy/APIProxy.ashx?/API/api/v1/Search/Properties/?f=319%20LIZZIE&ty=2018&pvty=2017&pn=1&st=9&so=1&pt=RP%3BPP%3BMH%3BNR&take=20&skip=0&page=1&pageSize=20
获取响应。
顺便说一句,这是我现有脚本遇到的错误:
Traceback (most recent call last):
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\Social.py", line 9, in <module>
payload = urllib.parse.quote_plus(fields, safe='', encoding=None, errors=None)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\lib\urllib\parse.py", line 728, in quote_plus
string = quote(string, safe + space, encoding, errors)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\lib\urllib\parse.py", line 712, in quote
return quote_from_bytes(string, safe)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\lib\urllib\parse.py", line 737, in quote_from_bytes
raise TypeError("quote_from_bytes() expected bytes")
TypeError: quote_from_bytes() expected bytes
这行得通。如 the documentation 所示,无需自己进行任何 URL 编码。
重点是查询字符串从最后一个问号开始,而不是从第一个问号开始。在 URL 中包含第二个问号是强制性的,因为 requests
只会在没有问号的情况下添加一个。
import requests
url = "http://search.wcad.org/Proxy/APIProxy.ashx?/API/api/v1/Search/Properties/?"
params = {'f':'319 lizzie','ty':'2018','pvty':'2017','pn':'1','st':'9','so':'1','pt':'RP;PP;MH;NR','take':'20','skip':'0','page':'1','pageSize':'20'}
response = requests.get(url, params)
response.json()
结果
{
'ResultList': [{
'PropertyQuickRefID': 'R016698',
'PartyQuickRefID': 'O0485204',
'OwnerQuickRefID': 'R016698',
'LegacyID': None,
'PropertyNumber': 'R-13-0410-0620-50000',
'OwnerName': 'GOOCH, PHILIP L',
'SitusAddress': '319 LIZZIE ST, TAYLOR, TX 76574',
'PropertyValue': 46785.0,
'LegalDescription': 'DOAK ADDITION, BLOCK 62, LOT 5',
'NeighborhoodCode': 'T541',
'Abstract': None,
'Subdivision': 'S3564 - Doak Addition',
'PropertyType': 'Real',
'ID': 0,
'Text': None,
'TaxYear': 2018,
'PropertyValueTaxYear': 2017
}],
'HasMoreData': False,
'TotalPageCount': 1,
'CurrentPage': 1,
'RecordCount': 1,
'SearchText': '319 lizzie',
'PagingHandledByCaller': False,
'TaxYear': 2018,
'PropertyValueTaxYear': 0
}
我在 python 中编写了一个脚本来从网页中抓取一些信息。该站点需要 get
请求方法。我现在面临的问题是,由于 parameters
需要与 url
合并,所以它应该是 urlencoded
。这就是我被困的地方。我无法对其进行正确编码以获得有效响应。我尝试了一下,但它没有带来任何东西
我尝试使用的脚本:
import requests
import urllib.parse
fields ={
'/API/api/v1/Search/Properties/?f':'319 lizzie','ty':'2018','pvty':'2017','pn':'1','st':'9','so':'1','pt':'RP;PP;MH;NR','take':'20','skip':'0','page':'1','pageSize':'20'
}
payload = urllib.parse.quote_plus(fields, safe='', encoding=None, errors=None)
headers={
"User-Agent":"Mozilla/5.0"
}
page = requests.get("http://search.wcad.org/Proxy/APIProxy.ashx?", params=payload, headers=headers)
print(page.json())
上面的URL应该是这样的:
http://search.wcad.org/Proxy/APIProxy.ashx?/API/api/v1/Search/Properties/?f=319%20LIZZIE&ty=2018&pvty=2017&pn=1&st=9&so=1&pt=RP%3BPP%3BMH%3BNR&take=20&skip=0&page=1&pageSize=20
获取响应。
顺便说一句,这是我现有脚本遇到的错误:
Traceback (most recent call last):
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\Social.py", line 9, in <module>
payload = urllib.parse.quote_plus(fields, safe='', encoding=None, errors=None)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\lib\urllib\parse.py", line 728, in quote_plus
string = quote(string, safe + space, encoding, errors)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\lib\urllib\parse.py", line 712, in quote
return quote_from_bytes(string, safe)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\lib\urllib\parse.py", line 737, in quote_from_bytes
raise TypeError("quote_from_bytes() expected bytes")
TypeError: quote_from_bytes() expected bytes
这行得通。如 the documentation 所示,无需自己进行任何 URL 编码。
重点是查询字符串从最后一个问号开始,而不是从第一个问号开始。在 URL 中包含第二个问号是强制性的,因为 requests
只会在没有问号的情况下添加一个。
import requests
url = "http://search.wcad.org/Proxy/APIProxy.ashx?/API/api/v1/Search/Properties/?"
params = {'f':'319 lizzie','ty':'2018','pvty':'2017','pn':'1','st':'9','so':'1','pt':'RP;PP;MH;NR','take':'20','skip':'0','page':'1','pageSize':'20'}
response = requests.get(url, params)
response.json()
结果
{ 'ResultList': [{ 'PropertyQuickRefID': 'R016698', 'PartyQuickRefID': 'O0485204', 'OwnerQuickRefID': 'R016698', 'LegacyID': None, 'PropertyNumber': 'R-13-0410-0620-50000', 'OwnerName': 'GOOCH, PHILIP L', 'SitusAddress': '319 LIZZIE ST, TAYLOR, TX 76574', 'PropertyValue': 46785.0, 'LegalDescription': 'DOAK ADDITION, BLOCK 62, LOT 5', 'NeighborhoodCode': 'T541', 'Abstract': None, 'Subdivision': 'S3564 - Doak Addition', 'PropertyType': 'Real', 'ID': 0, 'Text': None, 'TaxYear': 2018, 'PropertyValueTaxYear': 2017 }], 'HasMoreData': False, 'TotalPageCount': 1, 'CurrentPage': 1, 'RecordCount': 1, 'SearchText': '319 lizzie', 'PagingHandledByCaller': False, 'TaxYear': 2018, 'PropertyValueTaxYear': 0 }