从 crontab 调用 scrapy 项目时如何避免 "module not found" 错误?
How to avoid "module not found" error while calling scrapy project from crontab?
我目前正在构建一个小型测试项目,以学习如何在 Linux(Ubuntu 20.04.2 LTS)上使用 crontab
。
我的 crontab 文件如下所示:
* * * * * sh /home/path_to .../crontab_start_spider.sh >> /home/path_to .../log_python_test.log 2>&1
我想让crontab做的,就是使用下面的shell文件来启动一个scrapy项目。输出存储在文件 log_python_test.log.
中
我的shell档案(数字仅供本题参考):
0 #!/bin/bash
1 cd /home/luc/Documents/computing/tests/learning/morning
2 PATH=$PATH:/usr/local/bin
3 export PATH
4 PATH=$PATH:/home/luc/gen_env/lib/python3.7/site-packages
5 export PATH
6 scrapy crawl meteo
你们中的一些人可能对我的 scrapy 项目的结构感兴趣,所以这里是:
你可能还想要我在 scrapy 中编辑的代码:
我的蜘蛛:meteo.py
import scrapy
from morning.items import MorningItem
class MeteoSpider(scrapy.Spider):
name = 'meteo'
allowed_domains = ['meteo.gc.ca']
start_urls = ['https://www.meteo.gc.ca/city/pages/qc-136_metric_f.html']
def parse(self, response, **kwargs):
# Extracting data from page
condition =response.css('div.col-sm-4:nth-child(1) > dl:nth-child(1) > dd:nth-child(2)::text').get()
pression = response.css('div.col-sm-4:nth-child(1) > dl:nth-child(1) > dd:nth-child(4)::text').get()
temperature = response.css('div.brdr-rght-city:nth-child(2) > dl:nth-child(1) > dd:nth-child(2)::text').get()
# Creating and filling the item
item = MorningItem()
item['condition'] = condition
item['pression'] = pression
item['temperature'] = temperature
return item
我的项目:在items.py
import scrapy
class MorningItem(scrapy.Item):
condition = scrapy.Field()
pression = scrapy.Field()
temperature = scrapy.Field()
我的管道:在 pipelines.py 中(此默认管道在 settings.py 中未注释)
import logging
from gtts import gTTS
import os
import random
from itemadapter import ItemAdapter
class MorningPipeline:
def process_item(self, item, spider):
adapter = ItemAdapter(item)
# Message creation
messages = ["Bon matin! J'èspère que vous avez bien dormi cette nuit. Voici le topo.",
"Bonjour Luc. Un bon petit café et on est parti.", "Saluto amigo. Voici ce que vous devez savoir."]
message_of_the_day = messages[random.randint(0, len(messages) - 1)]
# Add meteo to message
message_of_the_day += f" Voici la météo. La condition: {adapter['condition']}. La pression: " \
f"{adapter['pression']} kilo-pascal. La température: {adapter['temperature']} celcius."
if '-' in adapter['temperature']:
message_of_the_day += " Vous devriez vous mettre un petit chandail."
elif len(adapter['temperature']) == 3:
if int(adapter['temperature'][0:2]) > 19:
message_of_the_day += " Vous allez être bien en sandales."
# Creating mp3
language = 'fr-ca'
output = gTTS(text=message_of_the_day, lang=language, slow=False)
# Prepare output file emplacement and saving
if os.path.exists("/home/luc/Music/output.mp3"):
os.remove("/home/luc/Music/output.mp3")
output.save("/home/luc/Music/output.mp3")
# Playing mp3 and retrieving the output
logging.info(f'First command output: {os.system("mpg123 /home/luc/Music/output.mp3")}')
return item
我运行终端中的项目没有任何问题(scrapy crawl meteo
):
WARNING:gtts.lang:'fr-ca' has been deprecated, falling back to 'fr'. This fallback will be removed in a future version.
2021-06-04 12:18:21 [gtts.lang] WARNING: 'fr-ca' has been deprecated, falling back to 'fr'. This fallback will be removed in a future version.
...
stats:
{'downloader/request_bytes': 471,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 14325,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'elapsed_time_seconds': 21.002126,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 6, 4, 16, 18, 41, 658684),
'item_scraped_count': 1,
'log_count/DEBUG': 82,
'log_count/INFO': 11,
'log_count/WARNING': 1,
'memusage/max': 60342272,
'memusage/startup': 60342272,
'response_received_count': 2,
'robotstxt/request_count': 1,
'robotstxt/response_count': 1,
'robotstxt/response_status_count/200': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2021, 6, 4, 16, 18, 20, 656558)}
INFO:scrapy.core.engine:Spider closed (finished)
2021-06-04 12:18:41 [scrapy.core.engine] INFO: Spider closed (finished)
只有一条小的弃用警告消息,我认为抓取成功了。从 crontab 运行时会出现问题。这是 log_python_test.log:
的输出
2021-06-04 12:00:02 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: morning)
2021-06-04 12:00:02 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.7.7 (default, May 6 2020, 14:51:16) - [GCC 9.3.0], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 2.9.2, Platform Linux-5.8.0-53-generic-x86_64-with-debian-bullseye-sid
2021-06-04 12:00:02 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2021-06-04 12:00:02 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'morning',
'NEWSPIDER_MODULE': 'morning.spiders',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['morning.spiders']}
2021-06-04 12:00:02 [scrapy.extensions.telnet] INFO: Telnet Password: bf691c25dae7d218
2021-06-04 12:00:02 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2021-06-04 12:00:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-06-04 12:00:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
Unhandled error in Deferred:
2021-06-04 12:00:02 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 192, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 196, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/home/luc/.local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "/home/luc/.local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- <exception caught here> ---
File "/home/luc/.local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 101, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 50, in load_object
mod = import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/luc/Documents/computing/tests/learning/morning/morning/pipelines.py", line 3, in <module>
from gtts import gTTS
builtins.ModuleNotFoundError: No module named 'gtts'
2021-06-04 12:00:02 [twisted] CRITICAL:
Traceback (most recent call last):
File "/home/luc/.local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 101, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 50, in load_object
mod = import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/luc/Documents/computing/tests/learning/morning/morning/pipelines.py", line 3, in <module>
from gtts import gTTS
ModuleNotFoundError: No module named 'gtts'
突然,它找不到 gtts 包了。它似乎不是唯一找不到的包,因为在我的 pipeline.py 的顶部有一个以前的版本 from mutagen.mp3 import MP3
,而且导入它也有问题。
我想知道,也许我在安装 gtts 包时犯了一个错误,所以我尝试了 pip install gtts 以确保一切正确并且我有:
Requirement already satisfied: gtts in /home/luc/gen_env/lib/python3.7/site-packages (2.2.2)
Requirement already satisfied: six in /home/luc/gen_env/lib/python3.7/site-packages (from gtts) (1.15.0)
Requirement already satisfied: requests in /home/luc/gen_env/lib/python3.7/site-packages (from gtts) (2.24.0)
Requirement already satisfied: click in /home/luc/gen_env/lib/python3.7/site-packages (from gtts) (7.1.2)
Requirement already satisfied: chardet<4,>=3.0.2 in /home/luc/gen_env/lib/python3.7/site-packages (from requests->gtts) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /home/luc/gen_env/lib/python3.7/site-packages (from requests->gtts) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /home/luc/gen_env/lib/python3.7/site-packages (from requests->gtts) (2020.6.20)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/luc/gen_env/lib/python3.7/site-packages (from requests->gtts) (1.25.10)
当我键入 pip list
:
时 gTTs 也会出现
gTTS 2.2.2
我还确保我安装在正确的环境中。分别是 which python
和 which pip
的结果:
/home/luc/gen_env/bin/python
/home/luc/gen_env/bin/pip
我以为我可以通过在 shell 文件中添加第四行和第五行来解决问题,但没有成功(输出相同)。我很确定我必须添加一些路径到 PYTHONPATH 或类似的东西,但我不确定我在做什么,我不想破坏任何东西。
提前致谢。
我找到了解决问题的办法。事实上,正如我所怀疑的那样,我的 PYTHONPATH 中缺少一个目录。这是包含 gtts 包的目录。
解决方法:
如果你有同样的问题,
- 找到包裹
我看了
- 将它添加到 sys.path(这也会将它添加到 PYTHONPATH)
在脚本顶部添加此代码(在我的例子中,pipelines.py):
import sys
sys.path.append("/<the_path_to_your_package>")
我目前正在构建一个小型测试项目,以学习如何在 Linux(Ubuntu 20.04.2 LTS)上使用 crontab
。
我的 crontab 文件如下所示:
* * * * * sh /home/path_to .../crontab_start_spider.sh >> /home/path_to .../log_python_test.log 2>&1
我想让crontab做的,就是使用下面的shell文件来启动一个scrapy项目。输出存储在文件 log_python_test.log.
中我的shell档案(数字仅供本题参考):
0 #!/bin/bash
1 cd /home/luc/Documents/computing/tests/learning/morning
2 PATH=$PATH:/usr/local/bin
3 export PATH
4 PATH=$PATH:/home/luc/gen_env/lib/python3.7/site-packages
5 export PATH
6 scrapy crawl meteo
你们中的一些人可能对我的 scrapy 项目的结构感兴趣,所以这里是:
你可能还想要我在 scrapy 中编辑的代码:
我的蜘蛛:meteo.py
import scrapy
from morning.items import MorningItem
class MeteoSpider(scrapy.Spider):
name = 'meteo'
allowed_domains = ['meteo.gc.ca']
start_urls = ['https://www.meteo.gc.ca/city/pages/qc-136_metric_f.html']
def parse(self, response, **kwargs):
# Extracting data from page
condition =response.css('div.col-sm-4:nth-child(1) > dl:nth-child(1) > dd:nth-child(2)::text').get()
pression = response.css('div.col-sm-4:nth-child(1) > dl:nth-child(1) > dd:nth-child(4)::text').get()
temperature = response.css('div.brdr-rght-city:nth-child(2) > dl:nth-child(1) > dd:nth-child(2)::text').get()
# Creating and filling the item
item = MorningItem()
item['condition'] = condition
item['pression'] = pression
item['temperature'] = temperature
return item
我的项目:在items.py
import scrapy
class MorningItem(scrapy.Item):
condition = scrapy.Field()
pression = scrapy.Field()
temperature = scrapy.Field()
我的管道:在 pipelines.py 中(此默认管道在 settings.py 中未注释)
import logging
from gtts import gTTS
import os
import random
from itemadapter import ItemAdapter
class MorningPipeline:
def process_item(self, item, spider):
adapter = ItemAdapter(item)
# Message creation
messages = ["Bon matin! J'èspère que vous avez bien dormi cette nuit. Voici le topo.",
"Bonjour Luc. Un bon petit café et on est parti.", "Saluto amigo. Voici ce que vous devez savoir."]
message_of_the_day = messages[random.randint(0, len(messages) - 1)]
# Add meteo to message
message_of_the_day += f" Voici la météo. La condition: {adapter['condition']}. La pression: " \
f"{adapter['pression']} kilo-pascal. La température: {adapter['temperature']} celcius."
if '-' in adapter['temperature']:
message_of_the_day += " Vous devriez vous mettre un petit chandail."
elif len(adapter['temperature']) == 3:
if int(adapter['temperature'][0:2]) > 19:
message_of_the_day += " Vous allez être bien en sandales."
# Creating mp3
language = 'fr-ca'
output = gTTS(text=message_of_the_day, lang=language, slow=False)
# Prepare output file emplacement and saving
if os.path.exists("/home/luc/Music/output.mp3"):
os.remove("/home/luc/Music/output.mp3")
output.save("/home/luc/Music/output.mp3")
# Playing mp3 and retrieving the output
logging.info(f'First command output: {os.system("mpg123 /home/luc/Music/output.mp3")}')
return item
我运行终端中的项目没有任何问题(scrapy crawl meteo
):
WARNING:gtts.lang:'fr-ca' has been deprecated, falling back to 'fr'. This fallback will be removed in a future version.
2021-06-04 12:18:21 [gtts.lang] WARNING: 'fr-ca' has been deprecated, falling back to 'fr'. This fallback will be removed in a future version.
...
stats:
{'downloader/request_bytes': 471,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 14325,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'elapsed_time_seconds': 21.002126,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 6, 4, 16, 18, 41, 658684),
'item_scraped_count': 1,
'log_count/DEBUG': 82,
'log_count/INFO': 11,
'log_count/WARNING': 1,
'memusage/max': 60342272,
'memusage/startup': 60342272,
'response_received_count': 2,
'robotstxt/request_count': 1,
'robotstxt/response_count': 1,
'robotstxt/response_status_count/200': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2021, 6, 4, 16, 18, 20, 656558)}
INFO:scrapy.core.engine:Spider closed (finished)
2021-06-04 12:18:41 [scrapy.core.engine] INFO: Spider closed (finished)
只有一条小的弃用警告消息,我认为抓取成功了。从 crontab 运行时会出现问题。这是 log_python_test.log:
的输出2021-06-04 12:00:02 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: morning)
2021-06-04 12:00:02 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.7.7 (default, May 6 2020, 14:51:16) - [GCC 9.3.0], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 2.9.2, Platform Linux-5.8.0-53-generic-x86_64-with-debian-bullseye-sid
2021-06-04 12:00:02 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2021-06-04 12:00:02 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'morning',
'NEWSPIDER_MODULE': 'morning.spiders',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['morning.spiders']}
2021-06-04 12:00:02 [scrapy.extensions.telnet] INFO: Telnet Password: bf691c25dae7d218
2021-06-04 12:00:02 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2021-06-04 12:00:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-06-04 12:00:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
Unhandled error in Deferred:
2021-06-04 12:00:02 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 192, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 196, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/home/luc/.local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "/home/luc/.local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- <exception caught here> ---
File "/home/luc/.local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 101, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 50, in load_object
mod = import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/luc/Documents/computing/tests/learning/morning/morning/pipelines.py", line 3, in <module>
from gtts import gTTS
builtins.ModuleNotFoundError: No module named 'gtts'
2021-06-04 12:00:02 [twisted] CRITICAL:
Traceback (most recent call last):
File "/home/luc/.local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/crawler.py", line 101, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/home/luc/.local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 50, in load_object
mod = import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/luc/Documents/computing/tests/learning/morning/morning/pipelines.py", line 3, in <module>
from gtts import gTTS
ModuleNotFoundError: No module named 'gtts'
突然,它找不到 gtts 包了。它似乎不是唯一找不到的包,因为在我的 pipeline.py 的顶部有一个以前的版本 from mutagen.mp3 import MP3
,而且导入它也有问题。
我想知道,也许我在安装 gtts 包时犯了一个错误,所以我尝试了 pip install gtts 以确保一切正确并且我有:
Requirement already satisfied: gtts in /home/luc/gen_env/lib/python3.7/site-packages (2.2.2)
Requirement already satisfied: six in /home/luc/gen_env/lib/python3.7/site-packages (from gtts) (1.15.0)
Requirement already satisfied: requests in /home/luc/gen_env/lib/python3.7/site-packages (from gtts) (2.24.0)
Requirement already satisfied: click in /home/luc/gen_env/lib/python3.7/site-packages (from gtts) (7.1.2)
Requirement already satisfied: chardet<4,>=3.0.2 in /home/luc/gen_env/lib/python3.7/site-packages (from requests->gtts) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /home/luc/gen_env/lib/python3.7/site-packages (from requests->gtts) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /home/luc/gen_env/lib/python3.7/site-packages (from requests->gtts) (2020.6.20)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/luc/gen_env/lib/python3.7/site-packages (from requests->gtts) (1.25.10)
当我键入 pip list
:
gTTS 2.2.2
我还确保我安装在正确的环境中。分别是 which python
和 which pip
的结果:
/home/luc/gen_env/bin/python
/home/luc/gen_env/bin/pip
我以为我可以通过在 shell 文件中添加第四行和第五行来解决问题,但没有成功(输出相同)。我很确定我必须添加一些路径到 PYTHONPATH 或类似的东西,但我不确定我在做什么,我不想破坏任何东西。
提前致谢。
我找到了解决问题的办法。事实上,正如我所怀疑的那样,我的 PYTHONPATH 中缺少一个目录。这是包含 gtts 包的目录。
解决方法: 如果你有同样的问题,
- 找到包裹
我看了
- 将它添加到 sys.path(这也会将它添加到 PYTHONPATH)
在脚本顶部添加此代码(在我的例子中,pipelines.py):
import sys
sys.path.append("/<the_path_to_your_package>")