DataflowRunner 退出 "No files to stage has been found."
DataflowRunner exits with "No files to stage has been found."
我想 运行 来自 https://beam.apache.org/get-started/quickstart-java/ 的 WordCount
java 示例,但不知何故我得到一个错误,[= 没有找到要暂存的文件16=]。我运行这个例子完全按照网站上的描述:
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
--gcpTempLocation=gs://<your-gcs-bucket>/tmp \
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
-Pdataflow-runner
,产生
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:214)
... 5 more
Caused by: java.lang.IllegalArgumentException: No files to stage has been found.
at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:281)
... 10 more
我使用的是最新的光束版本
<beam.version>2.19.0</beam.version>
你知道如何解决这个问题吗?
编辑:
这是 2.19.0 中的错误。它适用于 2.18.0
编辑:
我在 Windows
上使用 Redhat OpenJDK 8
编辑:
此外,一些单元测试在标准 wordcount 示例中失败
DebuggingWordCountTest 失败
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.FileNotFoundException: No files matched spec: /Users/<redacted>/AppData/Local/Temp/junit7907687962995108435/junit2682353785908929665.tmp
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:321)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
- 当您 运行 连接数据流时,它会尝试查找并上传
依赖项。
- 我假设您收到错误消息“没有要暂存的文件
被发现”由于一些类路径问题。
- 尝试使用--filesToStage
手动提供 jars 或 类 到 stage
的选项
还提供了成功将 114 个文件复制到阶段的样本日志,以便您可以与完整日志进行比较以了解问题。
Mar 08, 2020 7:37:41 PM org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory create
INFO: No stagingLocation provided, falling back to gcpTempLocation
Mar 08, 2020 7:37:42 PM org.apache.beam.runners.dataflow.DataflowRunner fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 114 files. Enable logging at DEBUG level to see which files will be staged.
Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Uploading 114 files from PipelineOptions.filesToStage to staging location to prepare for execution.
Mar 08, 2020 7:37:48 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Staging files complete: 114 files cached, 0 files newly uploaded
您可以尝试使用以下命令来生成所需的源代码和运行 新管道以暂存依赖项。
mvn archetype:generate \
-DarchetypeGroupId=org.apache.beam \
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-DarchetypeVersion=2.8.0 \
-DgroupId=org.example \
-DartifactId=first-dataflow \
-Dversion="0.1" \
-Dpackage=org.apache.beam.examples \
-DinteractiveMode=false
您也可以在 qwiklabs 免费试用:
https://google.qwiklabs.com/focuses/7974?parent=catalog
我想 运行 来自 https://beam.apache.org/get-started/quickstart-java/ 的 WordCount
java 示例,但不知何故我得到一个错误,[= 没有找到要暂存的文件16=]。我运行这个例子完全按照网站上的描述:
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
--gcpTempLocation=gs://<your-gcs-bucket>/tmp \
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
-Pdataflow-runner
,产生
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:214)
... 5 more
Caused by: java.lang.IllegalArgumentException: No files to stage has been found.
at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:281)
... 10 more
我使用的是最新的光束版本
<beam.version>2.19.0</beam.version>
你知道如何解决这个问题吗?
编辑: 这是 2.19.0 中的错误。它适用于 2.18.0
编辑: 我在 Windows
上使用 Redhat OpenJDK 8编辑: 此外,一些单元测试在标准 wordcount 示例中失败
DebuggingWordCountTest 失败
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.FileNotFoundException: No files matched spec: /Users/<redacted>/AppData/Local/Temp/junit7907687962995108435/junit2682353785908929665.tmp
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:321)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
- 当您 运行 连接数据流时,它会尝试查找并上传 依赖项。
- 我假设您收到错误消息“没有要暂存的文件 被发现”由于一些类路径问题。
- 尝试使用--filesToStage 手动提供 jars 或 类 到 stage 的选项
还提供了成功将 114 个文件复制到阶段的样本日志,以便您可以与完整日志进行比较以了解问题。
Mar 08, 2020 7:37:41 PM org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory create
INFO: No stagingLocation provided, falling back to gcpTempLocation
Mar 08, 2020 7:37:42 PM org.apache.beam.runners.dataflow.DataflowRunner fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 114 files. Enable logging at DEBUG level to see which files will be staged.
Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Uploading 114 files from PipelineOptions.filesToStage to staging location to prepare for execution.
Mar 08, 2020 7:37:48 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Staging files complete: 114 files cached, 0 files newly uploaded
您可以尝试使用以下命令来生成所需的源代码和运行 新管道以暂存依赖项。
mvn archetype:generate \
-DarchetypeGroupId=org.apache.beam \
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-DarchetypeVersion=2.8.0 \
-DgroupId=org.example \
-DartifactId=first-dataflow \
-Dversion="0.1" \
-Dpackage=org.apache.beam.examples \
-DinteractiveMode=false
您也可以在 qwiklabs 免费试用: https://google.qwiklabs.com/focuses/7974?parent=catalog