在 Windows Server 2019 上安装 Scrapy,在 Docker 容器中 运行
Install Scrapy on Windows Server 2019, running in a Docker container
我想在 Windows Server 2019 上安装 Scrapy,运行 宁在 Docker 容器中(请参阅 and 了解我的安装历史)。
在我的本地 Windows 10 机器上,我可以 运行 我的 Scrapy 命令,就像在 Windows PowerShell 中一样(在简单启动 Docker 桌面之后):
scrapy crawl myscraper -o allobjects.json
在文件夹 C:\scrapy\my1stscraper\
中
对于此处推荐的 Windows 服务器,我首先按照以下步骤安装了 Anaconda:https://docs.scrapy.org/en/latest/intro/install.html.
然后我打开 Anaconda 提示符并在 D:\Programs
中键入 conda install -c conda-forge scrapy
(base) PS D:\Programs> dir
Directory: D:\Programs
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 4/22/2021 10:52 AM Anaconda3
-a---- 4/22/2021 11:20 AM 0 conda
(base) PS D:\Programs> conda install -c conda-forge scrapy
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.9.2
latest version: 4.10.1
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: D:\Programs\Anaconda3
added / updated specs:
- scrapy
The following packages will be downloaded:
package | build
---------------------------|-----------------
automat-20.2.0 | py_0 30 KB conda-forge
conda-4.10.1 | py38haa244fe_0 3.1 MB conda-forge
constantly-15.1.0 | py_0 9 KB conda-forge
cssselect-1.1.0 | py_0 18 KB conda-forge
hyperlink-21.0.0 | pyhd3deb0d_0 71 KB conda-forge
incremental-17.5.0 | py_0 14 KB conda-forge
itemadapter-0.2.0 | pyhd8ed1ab_0 12 KB conda-forge
parsel-1.6.0 | py_0 15 KB conda-forge
pyasn1-0.4.8 | py_0 53 KB conda-forge
pyasn1-modules-0.2.7 | py_0 60 KB conda-forge
pydispatcher-2.0.5 | py_1 12 KB conda-forge
pyhamcrest-2.0.2 | py_0 29 KB conda-forge
python_abi-3.8 | 1_cp38 4 KB conda-forge
queuelib-1.6.1 | pyhd8ed1ab_0 14 KB conda-forge
scrapy-2.4.1 | py38haa95532_0 372 KB
service_identity-18.1.0 | py_0 12 KB conda-forge
twisted-21.2.0 | py38h294d835_0 5.1 MB conda-forge
twisted-iocpsupport-1.0.1 | py38h294d835_0 49 KB conda-forge
w3lib-1.22.0 | pyh9f0ad1d_0 21 KB conda-forge
------------------------------------------------------------
Total: 9.0 MB
The following NEW packages will be INSTALLED:
automat conda-forge/noarch::automat-20.2.0-py_0
constantly conda-forge/noarch::constantly-15.1.0-py_0
cssselect conda-forge/noarch::cssselect-1.1.0-py_0
hyperlink conda-forge/noarch::hyperlink-21.0.0-pyhd3deb0d_0
incremental conda-forge/noarch::incremental-17.5.0-py_0
itemadapter conda-forge/noarch::itemadapter-0.2.0-pyhd8ed1ab_0
parsel conda-forge/noarch::parsel-1.6.0-py_0
pyasn1 conda-forge/noarch::pyasn1-0.4.8-py_0
pyasn1-modules conda-forge/noarch::pyasn1-modules-0.2.7-py_0
pydispatcher conda-forge/noarch::pydispatcher-2.0.5-py_1
pyhamcrest conda-forge/noarch::pyhamcrest-2.0.2-py_0
python_abi conda-forge/win-64::python_abi-3.8-1_cp38
queuelib conda-forge/noarch::queuelib-1.6.1-pyhd8ed1ab_0
scrapy pkgs/main/win-64::scrapy-2.4.1-py38haa95532_0
service_identity conda-forge/noarch::service_identity-18.1.0-py_0
twisted conda-forge/win-64::twisted-21.2.0-py38h294d835_0
twisted-iocpsuppo~ conda-forge/win-64::twisted-iocpsupport-1.0.1-py38h294d835_0
w3lib conda-forge/noarch::w3lib-1.22.0-pyh9f0ad1d_0
The following packages will be UPDATED:
conda pkgs/main::conda-4.9.2-py38haa95532_0 --> conda-forge::conda-4.10.1-py38haa244fe_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
constantly-15.1.0 | 9 KB | ############################################################################ | 100%
itemadapter-0.2.0 | 12 KB | ############################################################################ | 100%
twisted-21.2.0 | 5.1 MB | ############################################################################ | 100%
pydispatcher-2.0.5 | 12 KB | ############################################################################ | 100%
queuelib-1.6.1 | 14 KB | ############################################################################ | 100%
service_identity-18. | 12 KB | ############################################################################ | 100%
pyhamcrest-2.0.2 | 29 KB | ############################################################################ | 100%
cssselect-1.1.0 | 18 KB | ############################################################################ | 100%
automat-20.2.0 | 30 KB | ############################################################################ | 100%
pyasn1-0.4.8 | 53 KB | ############################################################################ | 100%
twisted-iocpsupport- | 49 KB | ############################################################################ | 100%
python_abi-3.8 | 4 KB | ############################################################################ | 100%
hyperlink-21.0.0 | 71 KB | ############################################################################ | 100%
conda-4.10.1 | 3.1 MB | ############################################################################ | 100%
scrapy-2.4.1 | 372 KB | ############################################################################ | 100%
incremental-17.5.0 | 14 KB | ############################################################################ | 100%
w3lib-1.22.0 | 21 KB | ############################################################################ | 100%
pyasn1-modules-0.2.7 | 60 KB | ############################################################################ | 100%
parsel-1.6.0 | 15 KB | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) PS D:\Programs>
在我的 VPS 上的 PowerShell 中,然后我尝试通过 D:\Programs\Anaconda3\Scripts\scrapy.exe
运行 scrapy
我想要 运行 我存储在文件夹 D:\scrapy\my1stscraper 中的蜘蛛,请参阅:
Docker Engine
服务 运行 宁作为 Windows 服务(假设我不需要在 运行宁我的 scrapy 命令时显式启动容器,如果我这样做,我不知道如何):
我试过像这样启动我的抓取工具 D:\Programs\Anaconda3\Scripts\scrapy.exe crawl D:\scrapy\my1stscraper\spiders\my1stscraper -o allobjects.json
,导致错误:
Traceback (most recent call last):
File "D:\Programs\Anaconda3\Scripts\scrapy-script.py", line 6, in <module>
from scrapy.cmdline import execute
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\__init__.py", line 12, in <module>
from scrapy.spiders import Spider
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\spiders\__init__.py", line 11, in <module>
from scrapy.http import Request
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\http\__init__.py", line 11, in <module>
from scrapy.http.request.form import FormRequest
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\http\request\form.py", line 10, in <module>
import lxml.html
File "D:\Programs\Anaconda3\lib\site-packages\lxml\html\__init__.py", line 53, in <module>
from .. import etree
ImportError: DLL load failed while importing etree: The specified module could not be found.
我在这里查看:
from lxml import etree ImportError: DLL load failed: The specified module could not be found
这里讨论的是 pip
,我没有使用它,但可以肯定的是我确实安装了 C++ 构建工具:
我仍然遇到同样的错误。我如何在 Docker 容器中 运行 我的 Scrapy 爬虫?
更新 1
我的 VPS 是我唯一的环境,所以不确定如何在虚拟环境中进行测试。
我现在做了什么:
- 卸载 Anacondo
- 使用 Python 3.8 (https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) 安装 Miniconda,未添加到路径并使用 miniconda 作为系统' python 3.8
查看您的建议:
Get steps to manually install the app on Windows Server - ideally test in a virtualised environment so you can reset it cleanly
- 当你说应用程序时,你是什么意思?垃圾?康达?
Convert all steps to a fully automatic powershell script (e.g. for conda, need to download the installer via wget, execute the installer etc.
我现在在主机 OS 上安装了 Conda,因为我认为这将使我的开销最少。或者直接安装在镜像中,如果是这样,我怎么不用每次都安装它?
最后,为了确定,我想 运行 多个 Scrapy 抓取器,但我想以尽可能少的开销来做到这一点。
我应该在同一个 docker 容器中为每个我想执行的爬虫重复 RUN
命令,对吗?
更新 2
whomami
确实returnsuser manager\containeradministrator
scrapy benchmark
returns
Scrapy 2.4.1 - no active project
Unknown command: benchmark
Use "scrapy" to see available commands
我在文件夹 D:\scrapy\my1stscraper
中有我想要 运行 的 scrapy 项目,我如何 运行 该项目,因为 D:\ 驱动器在我的容器中不可用?
更新 3
几个月后,当我们讨论这个问题时,当我现在 运行 你提出 Docker 文件时,它中断了,我现在得到这个输出:
PS D:\Programs> docker build . -t scrapy
Sending build context to Docker daemon 1.644GB
Step 1/9 : FROM mcr.microsoft.com/windows/servercore:ltsc2019
---> d1724c2d9a84
Step 2/9 : SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
---> Running in 5f79f1bf9b62
Removing intermediate container 5f79f1bf9b62
---> 8bb2a477eaca
Step 3/9 : RUN setx /M PATH $('C:\Users\ContainerAdministrator\miniconda3\Library\bin;C:\Users\ContainerAdministrator\miniconda3\Scripts;C:\Users\ContainerAdministrator\miniconda3;' + $Env:PATH)
---> Running in f3869c4f64d5
SUCCESS: Specified value was saved.
Removing intermediate container f3869c4f64d5
---> 82a2fa969a88
Step 4/9 : RUN Invoke-WebRequest "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -OutFile miniconda3.exe -UseBasicParsing; Start-Process -FilePath 'miniconda3.exe' -Wait -ArgumentList '/S', '/D=C:\Users\ContainerAdministrator\miniconda3'; Remove-Item .\miniconda3.exe; conda install -y -c conda-forge scrapy;
---> Running in 3eb8b7bfe878
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with the existing python installation in your environment:
Specifications:
- scrapy -> python[version='2.7.*|3.5.*|3.6.*|>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.5,<3.6.0a0|3.4.*']
Your python: python=3.9
If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.
不确定我是否读对了,但似乎 Scrapy 不支持 Python 3.9,除了这里我看到“Scrapy 需要 Python 3.6+” https://docs.scrapy.org/en/latest/intro/install.html
你知道是什么导致了这个问题吗?我也 checked here 但也没有回答
对于 运行 容器化应用程序,必须先将其安装 在容器映像中 - 您不想在主机上安装任何软件。
对于 linux,有现成的容器镜像可以满足您的 docker 桌面环境所使用的所有内容;我在 docker 集线器搜索 scrapy
上看到 1051 个结果,但其中 none 是 windows 个容器.
从头开始为应用创建 windows 容器的完整过程是:
- 获取在 Windows 服务器上手动安装应用程序(scrapy 及其依赖项)的步骤 - 理想情况下在虚拟化环境中进行测试,以便您可以彻底重置它
- 将所有步骤转换为全自动电源shell脚本(例如
conda
,需要通过wget
下载安装程序,执行安装程序等
- 可选地,在交互式容器中测试 powershell 步骤
docker run -it --isolation=process mcr.microsoft.com/windows/servercore:ltsc2019 powershell
- 这个 运行 是一个 windows 容器,并为您提供 shell 来验证您的安装脚本是否有效
- 当您退出 shell 容器停止时
- 创建
Dockerfile
- 通过
FROM
使用mcr.microsoft.com/windows/servercore:ltsc2019
作为基础图像
- 对你的每行电源使用
RUN
命令shell脚本
我尝试在使用 conda / python 3.6 的现有 windows Dockerfile 上安装 scrapy,它在类似阶段抛出错误 SettingsFrame has no attribute 'ENABLE_CONNECT_PROTOCOL'
。
但是我再次尝试使用 miniconda
和 python 3.8,并且能够获得 scrapy
运行ning,这是 docker 文件:
FROM mcr.microsoft.com/windows/servercore:ltsc2019
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
RUN setx /M PATH $('C:\Users\ContainerAdministrator\miniconda3\Library\bin;C:\Users\ContainerAdministrator\miniconda3\Scripts;C:\Users\ContainerAdministrator\miniconda3;' + $Env:PATH)
RUN Invoke-WebRequest "https://repo.anaconda.com/miniconda/Miniconda3-py38_4.10.3-Windows-x86_64.exe" -OutFile miniconda3.exe -UseBasicParsing; \
Start-Process -FilePath 'miniconda3.exe' -Wait -ArgumentList '/S', '/D=C:\Users\ContainerAdministrator\miniconda3'; \
Remove-Item .\miniconda3.exe; \
conda install -y -c conda-forge scrapy;
使用 docker build . -t scrapy
和 运行 使用 docker run -it scrapy
构建它。
为了验证你在容器 运行 whoami
中 运行ning 一个 shell - 应该 return user manager\containeradministrator
.
然后,scrapy benchmark
命令应该能够 运行 并转储一些统计数据。
当您关闭 shell.
时容器将停止
我想在 Windows Server 2019 上安装 Scrapy,运行 宁在 Docker 容器中(请参阅
在我的本地 Windows 10 机器上,我可以 运行 我的 Scrapy 命令,就像在 Windows PowerShell 中一样(在简单启动 Docker 桌面之后):
scrapy crawl myscraper -o allobjects.json
在文件夹 C:\scrapy\my1stscraper\
对于此处推荐的 Windows 服务器,我首先按照以下步骤安装了 Anaconda:https://docs.scrapy.org/en/latest/intro/install.html.
然后我打开 Anaconda 提示符并在 D:\Programs
中键入conda install -c conda-forge scrapy
(base) PS D:\Programs> dir
Directory: D:\Programs
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 4/22/2021 10:52 AM Anaconda3
-a---- 4/22/2021 11:20 AM 0 conda
(base) PS D:\Programs> conda install -c conda-forge scrapy
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.9.2
latest version: 4.10.1
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: D:\Programs\Anaconda3
added / updated specs:
- scrapy
The following packages will be downloaded:
package | build
---------------------------|-----------------
automat-20.2.0 | py_0 30 KB conda-forge
conda-4.10.1 | py38haa244fe_0 3.1 MB conda-forge
constantly-15.1.0 | py_0 9 KB conda-forge
cssselect-1.1.0 | py_0 18 KB conda-forge
hyperlink-21.0.0 | pyhd3deb0d_0 71 KB conda-forge
incremental-17.5.0 | py_0 14 KB conda-forge
itemadapter-0.2.0 | pyhd8ed1ab_0 12 KB conda-forge
parsel-1.6.0 | py_0 15 KB conda-forge
pyasn1-0.4.8 | py_0 53 KB conda-forge
pyasn1-modules-0.2.7 | py_0 60 KB conda-forge
pydispatcher-2.0.5 | py_1 12 KB conda-forge
pyhamcrest-2.0.2 | py_0 29 KB conda-forge
python_abi-3.8 | 1_cp38 4 KB conda-forge
queuelib-1.6.1 | pyhd8ed1ab_0 14 KB conda-forge
scrapy-2.4.1 | py38haa95532_0 372 KB
service_identity-18.1.0 | py_0 12 KB conda-forge
twisted-21.2.0 | py38h294d835_0 5.1 MB conda-forge
twisted-iocpsupport-1.0.1 | py38h294d835_0 49 KB conda-forge
w3lib-1.22.0 | pyh9f0ad1d_0 21 KB conda-forge
------------------------------------------------------------
Total: 9.0 MB
The following NEW packages will be INSTALLED:
automat conda-forge/noarch::automat-20.2.0-py_0
constantly conda-forge/noarch::constantly-15.1.0-py_0
cssselect conda-forge/noarch::cssselect-1.1.0-py_0
hyperlink conda-forge/noarch::hyperlink-21.0.0-pyhd3deb0d_0
incremental conda-forge/noarch::incremental-17.5.0-py_0
itemadapter conda-forge/noarch::itemadapter-0.2.0-pyhd8ed1ab_0
parsel conda-forge/noarch::parsel-1.6.0-py_0
pyasn1 conda-forge/noarch::pyasn1-0.4.8-py_0
pyasn1-modules conda-forge/noarch::pyasn1-modules-0.2.7-py_0
pydispatcher conda-forge/noarch::pydispatcher-2.0.5-py_1
pyhamcrest conda-forge/noarch::pyhamcrest-2.0.2-py_0
python_abi conda-forge/win-64::python_abi-3.8-1_cp38
queuelib conda-forge/noarch::queuelib-1.6.1-pyhd8ed1ab_0
scrapy pkgs/main/win-64::scrapy-2.4.1-py38haa95532_0
service_identity conda-forge/noarch::service_identity-18.1.0-py_0
twisted conda-forge/win-64::twisted-21.2.0-py38h294d835_0
twisted-iocpsuppo~ conda-forge/win-64::twisted-iocpsupport-1.0.1-py38h294d835_0
w3lib conda-forge/noarch::w3lib-1.22.0-pyh9f0ad1d_0
The following packages will be UPDATED:
conda pkgs/main::conda-4.9.2-py38haa95532_0 --> conda-forge::conda-4.10.1-py38haa244fe_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
constantly-15.1.0 | 9 KB | ############################################################################ | 100%
itemadapter-0.2.0 | 12 KB | ############################################################################ | 100%
twisted-21.2.0 | 5.1 MB | ############################################################################ | 100%
pydispatcher-2.0.5 | 12 KB | ############################################################################ | 100%
queuelib-1.6.1 | 14 KB | ############################################################################ | 100%
service_identity-18. | 12 KB | ############################################################################ | 100%
pyhamcrest-2.0.2 | 29 KB | ############################################################################ | 100%
cssselect-1.1.0 | 18 KB | ############################################################################ | 100%
automat-20.2.0 | 30 KB | ############################################################################ | 100%
pyasn1-0.4.8 | 53 KB | ############################################################################ | 100%
twisted-iocpsupport- | 49 KB | ############################################################################ | 100%
python_abi-3.8 | 4 KB | ############################################################################ | 100%
hyperlink-21.0.0 | 71 KB | ############################################################################ | 100%
conda-4.10.1 | 3.1 MB | ############################################################################ | 100%
scrapy-2.4.1 | 372 KB | ############################################################################ | 100%
incremental-17.5.0 | 14 KB | ############################################################################ | 100%
w3lib-1.22.0 | 21 KB | ############################################################################ | 100%
pyasn1-modules-0.2.7 | 60 KB | ############################################################################ | 100%
parsel-1.6.0 | 15 KB | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(base) PS D:\Programs>
在我的 VPS 上的 PowerShell 中,然后我尝试通过 D:\Programs\Anaconda3\Scripts\scrapy.exe
我想要 运行 我存储在文件夹 D:\scrapy\my1stscraper 中的蜘蛛,请参阅:
Docker Engine
服务 运行 宁作为 Windows 服务(假设我不需要在 运行宁我的 scrapy 命令时显式启动容器,如果我这样做,我不知道如何):
我试过像这样启动我的抓取工具 D:\Programs\Anaconda3\Scripts\scrapy.exe crawl D:\scrapy\my1stscraper\spiders\my1stscraper -o allobjects.json
,导致错误:
Traceback (most recent call last):
File "D:\Programs\Anaconda3\Scripts\scrapy-script.py", line 6, in <module>
from scrapy.cmdline import execute
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\__init__.py", line 12, in <module>
from scrapy.spiders import Spider
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\spiders\__init__.py", line 11, in <module>
from scrapy.http import Request
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\http\__init__.py", line 11, in <module>
from scrapy.http.request.form import FormRequest
File "D:\Programs\Anaconda3\lib\site-packages\scrapy\http\request\form.py", line 10, in <module>
import lxml.html
File "D:\Programs\Anaconda3\lib\site-packages\lxml\html\__init__.py", line 53, in <module>
from .. import etree
ImportError: DLL load failed while importing etree: The specified module could not be found.
我在这里查看: from lxml import etree ImportError: DLL load failed: The specified module could not be found
这里讨论的是 pip
,我没有使用它,但可以肯定的是我确实安装了 C++ 构建工具:
我仍然遇到同样的错误。我如何在 Docker 容器中 运行 我的 Scrapy 爬虫?
更新 1
我的 VPS 是我唯一的环境,所以不确定如何在虚拟环境中进行测试。
我现在做了什么:
- 卸载 Anacondo
- 使用 Python 3.8 (https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) 安装 Miniconda,未添加到路径并使用 miniconda 作为系统' python 3.8
查看您的建议:
Get steps to manually install the app on Windows Server - ideally test in a virtualised environment so you can reset it cleanly
- 当你说应用程序时,你是什么意思?垃圾?康达?
Convert all steps to a fully automatic powershell script (e.g. for conda, need to download the installer via wget, execute the installer etc.
我现在在主机 OS 上安装了 Conda,因为我认为这将使我的开销最少。或者直接安装在镜像中,如果是这样,我怎么不用每次都安装它?
最后,为了确定,我想 运行 多个 Scrapy 抓取器,但我想以尽可能少的开销来做到这一点。 我应该在同一个 docker 容器中为每个我想执行的爬虫重复
RUN
命令,对吗?
更新 2
whomami
确实returnsuser manager\containeradministrator
scrapy benchmark
returns
Scrapy 2.4.1 - no active project
Unknown command: benchmark
Use "scrapy" to see available commands
我在文件夹 D:\scrapy\my1stscraper
中有我想要 运行 的 scrapy 项目,我如何 运行 该项目,因为 D:\ 驱动器在我的容器中不可用?
更新 3
几个月后,当我们讨论这个问题时,当我现在 运行 你提出 Docker 文件时,它中断了,我现在得到这个输出:
PS D:\Programs> docker build . -t scrapy
Sending build context to Docker daemon 1.644GB
Step 1/9 : FROM mcr.microsoft.com/windows/servercore:ltsc2019
---> d1724c2d9a84
Step 2/9 : SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
---> Running in 5f79f1bf9b62
Removing intermediate container 5f79f1bf9b62
---> 8bb2a477eaca
Step 3/9 : RUN setx /M PATH $('C:\Users\ContainerAdministrator\miniconda3\Library\bin;C:\Users\ContainerAdministrator\miniconda3\Scripts;C:\Users\ContainerAdministrator\miniconda3;' + $Env:PATH)
---> Running in f3869c4f64d5
SUCCESS: Specified value was saved.
Removing intermediate container f3869c4f64d5
---> 82a2fa969a88
Step 4/9 : RUN Invoke-WebRequest "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -OutFile miniconda3.exe -UseBasicParsing; Start-Process -FilePath 'miniconda3.exe' -Wait -ArgumentList '/S', '/D=C:\Users\ContainerAdministrator\miniconda3'; Remove-Item .\miniconda3.exe; conda install -y -c conda-forge scrapy;
---> Running in 3eb8b7bfe878
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with the existing python installation in your environment:
Specifications:
- scrapy -> python[version='2.7.*|3.5.*|3.6.*|>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.5,<3.6.0a0|3.4.*']
Your python: python=3.9
If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.
不确定我是否读对了,但似乎 Scrapy 不支持 Python 3.9,除了这里我看到“Scrapy 需要 Python 3.6+” https://docs.scrapy.org/en/latest/intro/install.html 你知道是什么导致了这个问题吗?我也 checked here 但也没有回答
对于 运行 容器化应用程序,必须先将其安装 在容器映像中 - 您不想在主机上安装任何软件。
对于 linux,有现成的容器镜像可以满足您的 docker 桌面环境所使用的所有内容;我在 docker 集线器搜索 scrapy
上看到 1051 个结果,但其中 none 是 windows 个容器.
从头开始为应用创建 windows 容器的完整过程是:
- 获取在 Windows 服务器上手动安装应用程序(scrapy 及其依赖项)的步骤 - 理想情况下在虚拟化环境中进行测试,以便您可以彻底重置它
- 将所有步骤转换为全自动电源shell脚本(例如
conda
,需要通过wget
下载安装程序,执行安装程序等 - 可选地,在交互式容器中测试 powershell 步骤
docker run -it --isolation=process mcr.microsoft.com/windows/servercore:ltsc2019 powershell
- 这个 运行 是一个 windows 容器,并为您提供 shell 来验证您的安装脚本是否有效
- 当您退出 shell 容器停止时
- 创建
Dockerfile
- 通过
FROM
使用 - 对你的每行电源使用
RUN
命令shell脚本
mcr.microsoft.com/windows/servercore:ltsc2019
作为基础图像 - 通过
我尝试在使用 conda / python 3.6 的现有 windows Dockerfile 上安装 scrapy,它在类似阶段抛出错误 SettingsFrame has no attribute 'ENABLE_CONNECT_PROTOCOL'
。
但是我再次尝试使用 miniconda
和 python 3.8,并且能够获得 scrapy
运行ning,这是 docker 文件:
FROM mcr.microsoft.com/windows/servercore:ltsc2019
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
RUN setx /M PATH $('C:\Users\ContainerAdministrator\miniconda3\Library\bin;C:\Users\ContainerAdministrator\miniconda3\Scripts;C:\Users\ContainerAdministrator\miniconda3;' + $Env:PATH)
RUN Invoke-WebRequest "https://repo.anaconda.com/miniconda/Miniconda3-py38_4.10.3-Windows-x86_64.exe" -OutFile miniconda3.exe -UseBasicParsing; \
Start-Process -FilePath 'miniconda3.exe' -Wait -ArgumentList '/S', '/D=C:\Users\ContainerAdministrator\miniconda3'; \
Remove-Item .\miniconda3.exe; \
conda install -y -c conda-forge scrapy;
使用 docker build . -t scrapy
和 运行 使用 docker run -it scrapy
构建它。
为了验证你在容器 运行 whoami
中 运行ning 一个 shell - 应该 return user manager\containeradministrator
.
然后,scrapy benchmark
命令应该能够 运行 并转储一些统计数据。
当您关闭 shell.