Python RoboBrowser - 如何从此页面获取内容
Python RoboBrowser - How to get content from this page
我正在尝试在 http://pretraga2.apr.gov.rs/ObjedinjenePretrage/Search/Search
页上提交表单
但是我收到错误 (HTML),例如:
<!DOCTYPE html>
<html><head>
<title>Error</title>
</head>
<body>
<h2>
Sorry, an error occurred while processing your request.
</h2>
</body></html>
当前 Python 脚本:
#!/usr/bin/python
# vim: set fileencoding=utf-8 :
import win_unicode_console
win_unicode_console.enable()
import requests
from bs4 import BeautifulSoup
import urllib.parse
import re
from robobrowser import RoboBrowser
# import warnings
# warnings.filterwarnings("ignore")
# Browse to Genius
browser = RoboBrowser(history=True)
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
s = requests.Session()
s.headers = hdr
browser = RoboBrowser(session=s)
browser.open('http://pretraga2.apr.gov.rs/ObjedinjenePretrage/Search/Search')
#
form = browser.get_form(action='/ObjedinjenePretrage/Search/SearchResult')
form['SearchByRegistryCodeString'].value = '53254136'
browser.submit_form(form)
print(browser.parsed)
我尝试添加 headers,但没有成功。
还有什么问题?
现在已经解决了。
我注意到页面上有两个名称相同的表单。我认为第一个(被 display:none 隐藏)是作为蜜罐提交的。
无论如何解决方案是:
form = browser.get_forma(action='/ObjedinjenePretrage/Search/SearchResult')[1]
我正在尝试在 http://pretraga2.apr.gov.rs/ObjedinjenePretrage/Search/Search
页上提交表单但是我收到错误 (HTML),例如:
<!DOCTYPE html>
<html><head>
<title>Error</title>
</head>
<body>
<h2>
Sorry, an error occurred while processing your request.
</h2>
</body></html>
当前 Python 脚本:
#!/usr/bin/python
# vim: set fileencoding=utf-8 :
import win_unicode_console
win_unicode_console.enable()
import requests
from bs4 import BeautifulSoup
import urllib.parse
import re
from robobrowser import RoboBrowser
# import warnings
# warnings.filterwarnings("ignore")
# Browse to Genius
browser = RoboBrowser(history=True)
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
s = requests.Session()
s.headers = hdr
browser = RoboBrowser(session=s)
browser.open('http://pretraga2.apr.gov.rs/ObjedinjenePretrage/Search/Search')
#
form = browser.get_form(action='/ObjedinjenePretrage/Search/SearchResult')
form['SearchByRegistryCodeString'].value = '53254136'
browser.submit_form(form)
print(browser.parsed)
我尝试添加 headers,但没有成功。 还有什么问题?
现在已经解决了。 我注意到页面上有两个名称相同的表单。我认为第一个(被 display:none 隐藏)是作为蜜罐提交的。
无论如何解决方案是:
form = browser.get_forma(action='/ObjedinjenePretrage/Search/SearchResult')[1]