Python script on ubuntu - OSError: [Errno 12] Cannot allocate memory

Python script on ubuntu - OSError: [Errno 12] Cannot allocate memory

我正在 运行在 AWS (Ubuntu) EC2 实例上编写脚本。这是一个使用 selenium/chromedriver 和无头 chrome 来抓取一些网页的网络抓取工具。我以前使用这个脚本 运行ning 没有问题,但今天我遇到了错误。这是脚本:

options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--window-size=1420,1080')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-notifications")

options.binary_location='/usr/bin/chromium-browser'
driver = webdriver.Chrome(chrome_options=options)


#Set base url (SAN FRANCISCO)
base_url = 'https://www.bandsintown.com/en/c/san-francisco-ca?page='

events = []

for i in range(1,90):
    #cycle through pages in range
    driver.get(base_url + str(i))
    pageURL = base_url + str(i)
    print(pageURL)

当我 运行 来自 ubuntu 的这个脚本时,我得到这个错误:

  Traceback (most recent call last):
  File "BandsInTown_Scraper_SF.py", line 91, in <module>
    driver = webdriver.Chrome(chrome_options=options)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
    stdin=PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

我确认我运行正在使用相同版本的 Chromedriver/Chromium 浏览器:

ChromeDriver 79.0.3945.130 (e22de67c28798d98833a7137c0e22876237fc40a-refs/branch-heads/3945@{#1047})


Chromium 79.0.3945.130 Built on Ubuntu , running on Ubuntu 18.04

为了它的价值,我在 mac 上安装了这个 运行ning,而且我在同一个 EC2 实例上有多个网络抓取脚本,例如 运行ning(到目前为止只有 2 个脚本,所以不是很多)。

更新

我现在在 ubuntu 上尝试 运行 这个脚本时也遇到了这些错误:

    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 141, in _new_conn
        (self.host, self.port), self.timeout, **extra_kw)
      File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 60, in create_connection
        for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
      File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno -3] Temporary failure in name resolution


     During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 852, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 284, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
        timeout=timeout
    ^[[B  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
    ^[[B^[[A^[[A    _stacktrace=sys.exc_info()[2])
      File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "BandsInTown_Scraper_SF.py", line 39, in <module>
        res = requests.get(url)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 75, in get
        return request('get', url, params=params, **kwargs)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 60, in request
        return session.request(method=method, url=url, **kwargs)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
        resp = self.send(prep, **send_kwargs)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
        r = adapter.send(request, **kwargs)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

最后,这是我目前每月的 AWS 使用情况,没有显示任何内存配额被超过。

可能发生的事情是 Chromium 浏览器已更新,现在占用更多内存(或者可能泄漏内存更糟..你没有说它在死前获得了多少 url)

作为解决方法,启动更大的实例大小。不要说你正在使用什么实例大小,但如果你有 t3.micro 试试 t3.medium 代替。

这里有一个简单易懂的图表https://www.ec2instances.info/?region=eu-west-1

如果您启动了一个实例并想调整它的大小而不是从头开始重建,那么请使用控制台将其置于停止状态,更改大小并重新开始

这个错误信息...

    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

...意味着 操作系统 无法分配内存给 initiate/spawn 一个 新会话 .

此外,此错误消息...

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

...表示您的程序已成功迭代到 第 5 页,而在 第 6 页 上您会看到此错误。


我在您的代码块中没有发现任何问题。我已经把你的代码做了一些小的调整,这是执行结果:

  • 代码块:

    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    base_url = 'https://www.bandsintown.com/en/c/san-francisco-ca?page='
    for i in range(1,10):
        #cycle through pages in range
        driver.get(base_url + str(i))
        pageURL = base_url + str(i)
        print(pageURL)
    
  • 控制台输出:

    https://www.bandsintown.com/en/c/san-francisco-ca?page=1
    https://www.bandsintown.com/en/c/san-francisco-ca?page=2
    https://www.bandsintown.com/en/c/san-francisco-ca?page=3
    https://www.bandsintown.com/en/c/san-francisco-ca?page=4
    https://www.bandsintown.com/en/c/san-francisco-ca?page=5
    https://www.bandsintown.com/en/c/san-francisco-ca?page=6
    https://www.bandsintown.com/en/c/san-francisco-ca?page=7
    https://www.bandsintown.com/en/c/san-francisco-ca?page=8
    https://www.bandsintown.com/en/c/san-francisco-ca?page=9
    

深潜

此错误来自 subprocess.py:

self.pid = _posixsubprocess.fork_exec(
    args, executable_list,
    close_fds, tuple(sorted(map(int, fds_to_keep))),
    cwd, env_list,
    p2cread, p2cwrite, c2pread, c2pwrite,
    errread, errwrite,
    errpipe_read, errpipe_write,
    restore_signals, start_new_session, preexec_fn)

但是,根据 OSError: [Errno 12] Cannot allocate memory 中的讨论,此错误 OSError: [Errno 12] Cannot allocate memoryRAM / SWAP 有关.


交换Space

Swap Space is the memory space in the system hard drive that has been designated as a place for the to temporarily store data which it can no longer hold with in the RAM. This gives you the ability to increase the amount of data your program can keep in its working . The swap space on the hard drive will be used primarily when there is no longer sufficient space in RAM to hold in-use application data. However, the information written to I/O will be significantly slower than information kept in RAM, but the operating system will prefer to keep running application data in memory and use swap space for the older data. Deploying swap space as a fall back for when your system’s RAM is depleted is a safety measure against out-of-memory 在非 SSD 存储可用的系统上出现问题。


系统检查

要检查系统是否已经有可用的交换区space,您需要执行以下命令:

$ sudo swapon --show

如果你没有得到任何输出,这意味着你的系统目前没有可用的交换space。您还可以使用 free 实用程序验证没有活动交换,如下所示:

$ free -h

如果系统中没有活动交换区,您将看到如下输出:

Output
               total        used       free        shared      buff/cache  available
Mem:           488M         36M        104M        652K        348M        426M
Swap:            0B          0B          0B

正在创建交换文件

在这些情况下,您需要为交换分配 space 以用作专门用于该任务的单独分区,并且您可以创建一个驻留在现有分区上的交换文件。要创建一个 1 GB 的文件,您需要执行以下命令:

$ sudo fallocate -l 1G /swapfile

您可以通过执行以下命令验证是否保留了正确数量的 space:

$ ls -lh /swapfile

#Output
$ -rw-r--r-- 1 root root 1.0G Mar 08 10:30 /swapfile

这确认 交换文件 已创建,并预留了正确数量的 space。


启用交换 Space

一旦获得正确大小的文件,我们就需要将其实际转换为交换 space。现在您需要锁定文件的权限,以便只有具有特定权限的用户才能读取内容。这可以防止意外用户访问该文件,这会产生重大的安全隐患。所以您需要按照以下步骤操作:

  • 使文件仅供特定用户访问,例如root 通过执行以下命令:

    $ sudo chmod 600 /swapfile
    
  • 通过执行以下命令验证权限更改:

    $ ls -lh /swapfile
    
    #Output
    -rw------- 1 root root 1.0G Apr 25 11:14 /swapfile
    

    这确认只有 root 用户启用了读写标志。

  • 现在您需要通过执行以下命令将文件标记为交换 space:

    $ sudo mkswap /swapfile
    
    #Sample Output
    Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
    no label, UUID=6e965805-2ab9-450f-aed6-577e74089dbf
    
  • 接下来您需要启用交换文件,允许系统开始使用它执行以下命令:

    $ sudo swapon /swapfile
    
  • 您可以通过执行以下命令来验证交换是否可用:

    $ sudo swapon --show
    
    #Sample Output
    NAME      TYPE  SIZE USED PRIO
    /swapfile file 1024M   0B   -1
    
  • 最后再次检查 free 实用程序的输出以通过执行以下命令验证设置:

    $ free -h
    
    #Sample Output
              total        used        free      shared  buff/cache   available
    Mem:           488M         37M         96M        652K        354M        425M
    Swap:          1.0G          0B        1.0G
    

结论

成功设置 Swap Space 后,底层操作系统将根据需要开始使用它。