通过 CodeBuild 在 AWS Lambda 上安装 NLTK/WORDNET

Installing NLTK/WORDNET on AWS Lambda via CodeBuild

我正在尝试通过 CodeBuild 让 NLTK 和 Wordnet 在 lambda 上工作。

看起来它在 CloudFormation 中安装得很好,但我在 Lambda 中收到以下错误:

START RequestId: c660c446-e1c4-11e8-8047-15f59f1e002c Version: $LATEST
Unable to import module 'index': No module named 'nltk'

END RequestId: c660c446-e1c4-11e8-8047-15f59f1e002c
REPORT RequestId: c660c446-e1c4-11e8-8047-15f59f1e002c  Duration: 2.10 ms   Billed Duration: 100 ms     Memory Size: 128 MB Max Memory Used: 21 MB  

然而,当我检查时,它在 CodeBuild 中安装正常:

[Container] 2018/11/06 12:45:06 Running command pip install -U nltk
Collecting nltk
 Downloading https://files.pythonhosted.org/packages/50/09/3b1755d528ad9156ee7243d52aa5cd2b809ef053a0f31b53d92853dd653a/nltk-3.3.0.zip (1.4MB)
Requirement already up-to-date: six in /usr/local/lib/python2.7/site-packages (from nltk)
Building wheels for collected packages: nltk
 Running setup.py bdist_wheel for nltk: started
 Running setup.py bdist_wheel for nltk: finished with status 'done'
 Stored in directory: /root/.cache/pip/wheels/d1/ab/40/3bceea46922767e42986aef7606a600538ca80de6062dc266c
Successfully built nltk
Installing collected packages: nltk
Successfully installed nltk-3.3

这是实际的 python 代码:

import json
import datetime
import nltk
from nltk.corpus import wordnet as wn

这是 YML 文件:

version: 0.2

phases:
  install:
    commands:

      # Upgrade AWS CLI to the latest version
      - pip install --upgrade awscli

      # Install nltk & WordNet
      - pip install -U nltk
      - python -m nltk.downloader wordnet

  pre_build:
    commands:

      # Discover and run unit tests in the 'tests' directory. For more information, see <https://docs.python.org/3/library/unittest.html#test-discovery>
      # - python -m unittest discover tests

  build:
    commands:

      # Use AWS SAM to package the application by using AWS CloudFormation
      - aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml

artifacts:
  type: zip
  files:
    - template-export.yml

知道为什么它在 CodeBuild 中安装良好但无法在 Lambda 中访问模块 NLTK 吗?作为参考,如果您只是删除 NLTK,代码在 lambda 中运行良好。

我感觉这是一个 YML 文件问题,但不确定是什么,因为 NLTK 安装正常。

NLTK 仅安装在本地,在 CodeBuild 作业 运行 所在的机器上。您需要将 NLTK 复制到 CloudFormation 部署包中。您的 buildspec.yml 将看起来像这样:

install:
  commands:

  # Upgrade AWS CLI to the latest version
  - pip install --upgrade awscli

pre_build:
  commands:
  - virtualenv /venv

  # Install nltk & WordNet
  - pip install -U nltk
  - python -m nltk.downloader wordnet

build:
  commands:
  - cp -r /venv/lib/python3.6/site-packages/. ./

  # Use AWS SAM to package the application by using AWS CloudFormation
  - aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml

补充阅读:

好的,感谢 laika 为我指明了正确的方向。

这是通过 CodeStar / CodeBuild 将 NLTK 和 Wordnet 有效部署到 Lambda。一些注意事项:

1) 您不能使用 source venv/bin/activate,因为它不符合 POSIX。请改用 . venv/bin/activate,如下所示。

2) 您必须设置 NLTK 的路径,如定义目录部分所示。

buildspec.yml

version: 0.2

phases:
  install:
    commands:

      # Upgrade AWS CLI & PIP to the latest version
      - pip install --upgrade awscli
      - pip install --upgrade pip

      # Define Directories
      - export HOME_DIR=`pwd`
      - export NLTK_DATA=$HOME_DIR/nltk_data

  pre_build:
    commands:
      - cd $HOME_DIR

      # Create VirtualEnv to package for lambda
      - virtualenv venv
      - . venv/bin/activate

      # Install Supporting Libraries
      - pip install -U requests

      # Install WordNet
      - pip install -U nltk
      - python -m nltk.downloader -d $NLTK_DATA wordnet

      # Output Requirements
      - pip freeze > requirements.txt

      # Unit Tests
      # - python -m unittest discover tests

  build:
    commands:
      - cd $HOME_DIR
      - mv $VIRTUAL_ENV/lib/python3.6/site-packages/* .

      # Use AWS SAM to package the application by using AWS CloudFormation
      - aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml

artifacts:
  type: zip
  files:
    - template-export.yml

如果谁有任何改进LMK。它对我有用。