Google Cloud Dataproc 支持的 OSS

OSS supported by Google Cloud Dataproc

当我去https://cloud.google.com/dataproc的时候,我看到了这个...

“Dataproc 是一项完全托管且高度可扩展的服务,适用于 运行 Apache Spark、Apache Flink、Presto 和 30 多种开源工具和框架。”

但是gcloud dataproc jobs submit并没有列出所有这些。它只列出了 8 个(hadoop、hive、pig、presto、pyspark、spark、spark-r、spark-sql)。知道为什么吗?

~ gcloud dataproc jobs submit
ERROR: (gcloud.dataproc.jobs.submit) Command name argument expected.

Available commands for gcloud dataproc jobs submit:

      hadoop                  Submit a Hadoop job to a cluster.
      hive                    Submit a Hive job to a cluster.
      pig                     Submit a Pig job to a cluster.
      presto                  Submit a Presto job to a cluster.
      pyspark                 Submit a PySpark job to a cluster.
      spark                   Submit a Spark job to a cluster.
      spark-r                 Submit a SparkR job to a cluster.
      spark-sql               Submit a Spark SQL job to a cluster.

For detailed information on this command and its flags, run:
  gcloud dataproc jobs submit --help

一些 OSS 组件被提供为 Dataproc Optional Components. Not of all them have a job submit API, some (e.g., Anaconda, Jupyter) don't need one, some (e.g., Flink, Druid) 可能会在未来添加。

其他一些 OSS 组件作为库提供,例如 GCS connector, BigQuery connector、Apache Parquet。