将新文件添加到 Cloud Storage 时触发 Dataflow 作业

Triggering a Dataflow job when new files are added to Cloud Storage

我想在将新文件添加到存储桶时触发数据流作业,以便处理新数据并将其添加到 BigQuery table。我看到云函数 can be triggered by changes in the bucket, but I haven't found a way to start a Dataflow job using the gcloud node.js library

有没有一种方法可以使用 Cloud Functions 来执行此操作,或者是否有其他方法可以实现所需的结果(在将文件添加到存储桶时将新数据插入 BigQuery)?

Apache Beam 从 2.2 开始支持此功能。参见 Watching for new files matching a filepattern in Apache Beam

也许这个 post 有助于了解如何从 App Engine 或 Cloud Functions 触发数据流管道?

https://cloud.google.com/blog/big-data/2016/04/scheduling-dataflow-pipelines-using-app-engine-cron-service-or-cloud-functions