drysrape 安装 Ubuntu 服务器 16.04
drysrape install Ubuntu server 16.04
我在 ubuntu 16.04 服务器(数字海洋上的全新安装)上实施 dryscrape 时遇到问题 - objective 抓取 JS 填充的网站。
我正在按照 here 的 dryscrape 安装说明进行操作:
apt-get update
apt-get install qt5-default libqt5webkit5-dev build-essential \
python-lxml python-pip xvfb
pip install dryscrape
然后 运行 我发现 here 的以下 python 脚本以及同一 link 的测试 html 页面。 (它returns html or JS)
Python
import dryscrape
from bs4 import BeautifulSoup
session = dryscrape.Session()
my_url = 'http://www.example.com/scrape.php'
session.visit(my_url)
response = session.body()
soup = BeautifulSoup(response)
soup.find(id="intro-text")
HTML - scrape.php
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Javascript scraping test</title>
</head>
<body>
<p id='intro-text'>No javascript support</p>
<script>
document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
</script>
</body>
</html>
当我这样做时,我似乎无法获得预期的 return 数据,而只是错误。
我想知道我是否遗漏了什么?
注意:我搜索了大量安装 guides/threads,但似乎无法正常运行。我也尝试过使用硒,但似乎也无济于事。非常感谢。
输出
Traceback (most recent call last):
File "js.py", line 3, in <module>
session = dryscrape.Session()
File "/usr/local/lib/python2.7/dist-packages/dryscrape/session.py", line 22, in __init__
self.driver = driver or DefaultDriver()
File "/usr/local/lib/python2.7/dist-packages/dryscrape/driver/webkit.py", line 30, in __init__
super(Driver, self).__init__(**kw)
File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 230, in __init__
self.conn = connection or ServerConnection()
File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 507, in __init__
self._sock = (server or get_default_server()).connect()
File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 450, in get_default_server
_default_server = Server()
File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 424, in __init__
raise NoX11Error("Could not connect to X server. "
webkit_server.NoX11Error: Could not connect to X server. Try calling dryscrape.start_xvfb() before creating a session.
工作脚本
import dryscrape
from bs4 import BeautifulSoup
dryscrape.start_xvfb()
session = dryscrape.Session()
my_url = 'https://www.example.com/scrape.php'
session.visit(my_url)
response = session.body()
soup = BeautifulSoup(response, "html.parser")
print soup.find(id="intro-text").text
您没有 X 服务器 运行。线索是
Try calling dryscrape.start_xvfb() before creating a session
见http://dryscrape.readthedocs.io/en/latest/usage.html
if 'linux' in sys.platform:
# start xvfb in case no X is running. Make sure xvfb
# is installed, otherwise this won't work!
dryscrape.start_xvfb()
http://dryscrape.readthedocs.io/en/latest/installation.html
xvfb_ (necessary only if no other X server is available)
因此您只需添加:
dryscrape.start_xvfb()
之前:
session = dryscrape.Session()
我在 ubuntu 16.04 服务器(数字海洋上的全新安装)上实施 dryscrape 时遇到问题 - objective 抓取 JS 填充的网站。
我正在按照 here 的 dryscrape 安装说明进行操作:
apt-get update
apt-get install qt5-default libqt5webkit5-dev build-essential \
python-lxml python-pip xvfb
pip install dryscrape
然后 运行 我发现 here 的以下 python 脚本以及同一 link 的测试 html 页面。 (它returns html or JS)
Python
import dryscrape
from bs4 import BeautifulSoup
session = dryscrape.Session()
my_url = 'http://www.example.com/scrape.php'
session.visit(my_url)
response = session.body()
soup = BeautifulSoup(response)
soup.find(id="intro-text")
HTML - scrape.php
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Javascript scraping test</title>
</head>
<body>
<p id='intro-text'>No javascript support</p>
<script>
document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
</script>
</body>
</html>
当我这样做时,我似乎无法获得预期的 return 数据,而只是错误。
我想知道我是否遗漏了什么?
注意:我搜索了大量安装 guides/threads,但似乎无法正常运行。我也尝试过使用硒,但似乎也无济于事。非常感谢。
输出
Traceback (most recent call last):
File "js.py", line 3, in <module>
session = dryscrape.Session()
File "/usr/local/lib/python2.7/dist-packages/dryscrape/session.py", line 22, in __init__
self.driver = driver or DefaultDriver()
File "/usr/local/lib/python2.7/dist-packages/dryscrape/driver/webkit.py", line 30, in __init__
super(Driver, self).__init__(**kw)
File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 230, in __init__
self.conn = connection or ServerConnection()
File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 507, in __init__
self._sock = (server or get_default_server()).connect()
File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 450, in get_default_server
_default_server = Server()
File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 424, in __init__
raise NoX11Error("Could not connect to X server. "
webkit_server.NoX11Error: Could not connect to X server. Try calling dryscrape.start_xvfb() before creating a session.
工作脚本
import dryscrape
from bs4 import BeautifulSoup
dryscrape.start_xvfb()
session = dryscrape.Session()
my_url = 'https://www.example.com/scrape.php'
session.visit(my_url)
response = session.body()
soup = BeautifulSoup(response, "html.parser")
print soup.find(id="intro-text").text
您没有 X 服务器 运行。线索是
Try calling dryscrape.start_xvfb() before creating a session
见http://dryscrape.readthedocs.io/en/latest/usage.html
if 'linux' in sys.platform:
# start xvfb in case no X is running. Make sure xvfb
# is installed, otherwise this won't work!
dryscrape.start_xvfb()
http://dryscrape.readthedocs.io/en/latest/installation.html
xvfb_ (necessary only if no other X server is available)
因此您只需添加:
dryscrape.start_xvfb()
之前:
session = dryscrape.Session()