Scrapy shell 针对本地文件
Scrapy shell against a local file
在 Scrapy 1.0 之前,我可以 运行 Scrapy Shell 非常简单地针对本地文件:
$ scrapy shell index.html
升级到1.0.3后,开始报错:
$ scrapy shell index.html
2015-10-12 15:32:59 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot)
2015-10-12 15:32:59 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-10-12 15:32:59 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
Traceback (most recent call last):
File "/Users/user/.virtualenvs/so/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/commands/shell.py", line 50, in run
spidercls = spidercls_for_request(spider_loader, Request(url),
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 24, in __init__
self._set_url(url)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 59, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: index.html
这种行为是故意的还是 Scrapy 中的错误Shell?
作为解决方法,我可以在 "file" URL 方案中使用文件的绝对路径:
$ scrapy shell file:////absolute/path/to/index.html
这显然不那么方便和容易。
更新: for Scrapy >=1.1,这是内置功能,你可以这样做:
scrapy shell file:///path/to/file.html
旧答案:
根据 Running scrapy shell against a local file, the relevant change was introduced by this commit. There was a Pull Request 中针对此问题创建的讨论 使 Scrapy shell 再次打开本地文件 并且计划成为碎片化 1.1.
配置如下
- MacOS X
- 刮擦 1.6.0
对我有用的是 scrapy shell ./index.html
和 index.html
在你的 scrapy 生成项目的根文件夹中
对于 Scrapy==2.5.1,您可以 运行 Scrapy Shell 对本地文件,如下所示:
如果文件在同一目录中,请在文件名前使用“./”,如下所示:
scrapy shell ./file.html
在 Scrapy 1.0 之前,我可以 运行 Scrapy Shell 非常简单地针对本地文件:
$ scrapy shell index.html
升级到1.0.3后,开始报错:
$ scrapy shell index.html
2015-10-12 15:32:59 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot)
2015-10-12 15:32:59 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-10-12 15:32:59 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
Traceback (most recent call last):
File "/Users/user/.virtualenvs/so/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/commands/shell.py", line 50, in run
spidercls = spidercls_for_request(spider_loader, Request(url),
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 24, in __init__
self._set_url(url)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 59, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: index.html
这种行为是故意的还是 Scrapy 中的错误Shell?
作为解决方法,我可以在 "file" URL 方案中使用文件的绝对路径:
$ scrapy shell file:////absolute/path/to/index.html
这显然不那么方便和容易。
更新: for Scrapy >=1.1,这是内置功能,你可以这样做:
scrapy shell file:///path/to/file.html
旧答案:
根据 Running scrapy shell against a local file, the relevant change was introduced by this commit. There was a Pull Request 中针对此问题创建的讨论 使 Scrapy shell 再次打开本地文件 并且计划成为碎片化 1.1.
配置如下
- MacOS X
- 刮擦 1.6.0
对我有用的是 scrapy shell ./index.html
和 index.html
在你的 scrapy 生成项目的根文件夹中
对于 Scrapy==2.5.1,您可以 运行 Scrapy Shell 对本地文件,如下所示:
如果文件在同一目录中,请在文件名前使用“./”,如下所示:
scrapy shell ./file.html