具有 python 弹性模板的数据流 - 启动器超时

Dataflow with python flex template - launcher timeout

我正在尝试 运行 我的 python 数据流作业使用 flex 模板。当我 运行 使用直接 运行 ner(没有 flex 模板)时,工作在本地工作正常但是当我尝试使用 flex 模板 运行 它时,工作暂时处于“排队”状态并且然后超时失败。

这是我在 GCE 控制台中找到的一些日志:

INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/dataflow/template/requirements.txt', '--exists-action', 'i', '--no-binary', ':all:'

Shutting down the GCE instance, launcher-202011121540156428385273524285797, used for launching.

Timeout in polling result file: gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
Possible causes are:
1. Your launch takes too long time to finish. Please check the logs on stackdriver.
2. Service my_service_account@developer.gserviceaccount.com may not have enough permissions to pull container image gcr.io/indigo-computer-272415/samples/dataflow/streaming-beam-py:latest or create new objects in gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
3. Transient errors occurred, please try again.

对于1,我看没什么用lo。对于 2,服务帐户是默认服务帐户,因此它应该具有所有权限。

如何进一步调试?

这是我的 Docker 文件:

FROM gcr.io/dataflow-templates-base/python3-template-launcher-base

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

ADD localdeps localdeps
COPY requirements.txt .
COPY main.py .
COPY setup.py .
COPY bq_field_pb2.py .
COPY bq_table_pb2.py .
COPY core_pb2.py .

ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"

RUN pip install -U  --no-cache-dir -r ./requirements.txt

我正在遵循本指南 - https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates

可以在 requirements.txt 文件中找到导致此问题的可能原因。如果您尝试在需求文件中安装 apache-beam,则 flex 模板将遇到您所描述的确切问题:作业在排队状态停留一段时间,最终失败并显示 Timeout in polling result

原因是,它们受到 this 问题的影响。这只会影响 flex 模板,作业 运行 在本地或标准模板中正确。

解决方法是在Dockerfile中单独安装。

RUN pip install -U apache-beam==<your desired version>
RUN pip install -U -r ./requirements.txt

下载加速启动 Dataflow 作业的要求。

FROM gcr.io/dataflow-templates-base/python3-template-launcher-base

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

COPY . .

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"

RUN apt-get update \
    # Upgrade pip and install the requirements.
    && pip install --no-cache-dir --upgrade pip \
    && pip install --no-cache-dir -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE \
    # Download the requirements to speed up launching the Dataflow job.
    && pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE


# Since we already downloaded all the dependencies, there's no need to rebuild everything.
ENV PIP_NO_DEPS=True