Google Dataflow shows AttributeError: 'module' object has no attribute 'Read'

Google Dataflow shows AttributeError: 'module' object has no attribute 'Read'

我正在使用 google 云进行测试,我遵循针对 BigQuery 运行 测试的指南。 https://cloud.google.com/solutions/using-cloud-dataflow-for-batch-predictions-with-tensorflow

当我 运行 脚本时:

python prediction/run.py \
--runner DataflowRunner \
--project $PROJECT \
--staging_location $BUCKET/staging \
--temp_location $BUCKET/temp \
--job_name $PROJECT-prediction-bq \
--setup_file prediction/setup.py \
--model $BUCKET/model \
--source bq \
--input $PROJECT:mnist.images \
--output $PROJECT:mnist.predict

显示

Traceback (most recent call last):
  File "prediction/run.py", line 23, in <module>
    predict.run()
  File "/home/ahuoo_com/dataflow-prediction-example/prediction/modules/predict.py", line 98, in run
    images = p | 'ReadFromBQ' >> beam.Read(beam.io.BigQuerySource(known_args.input))
**AttributeError: 'module' object has no attribute 'Read'**

apache_beam 包似乎不包含属性 'Read'。我认为 github 中提供的示例 google 可能是错误的。您可以看一下第 98 行的代码。

https://github.com/GoogleCloudPlatform/dataflow-prediction-example/blob/master/prediction/modules/predict.py

有人用这个指南做测试吗?

你是对的,代码中有一个小错误。在 98 行中,它说:

images = p | 'ReadFromBQ' >> beam.Read(beam.io.BigQuerySource(known_args.input))

应该是:

images = p | 'ReadFromBQ' >> beam.io.Read(beam.io.BigQuerySource(known_args.input))

另外,在第 100 行,它说:

predictions | 'WriteToBQ' >> beam.Write(beam.io.BigQuerySink(...))

也应该是这样的:

predictions | 'WriteToBQ' >> beam.io.Write(beam.io.BigQuerySink(...))

PCollection 读/写资源来自 io 模块而不是 apache_beam 本身。