Google Dataproc 中的批量配置单元 table 创建

Question

我是 Google Cloud Platform 的新手，我正在做一个将 Hive 应用程序（表和作业）移动到 Google Dataproc 的 POC。数据已移至 Google 云存储。

是否有一种内置方法可以在 dataproc 中批量创建来自 hive 的所有表，而不是使用 hive 提示符一个一个地创建？

Answer 1

Dataproc 支持 Hive 作业类型，因此您可以使用 gcloud 命令：

gcloud dataproc jobs submit hive --cluster=CLUSTER \
   -e 'create table t1 (id int, name string); create table t2 ...;'

或

gcloud dataproc jobs submit hive --cluster=CLUSTER -f create_tables.hql

你也可以通过SSH进入master节点，然后使用beeline执行脚本：

beeline -u jdbc:hive2://localhost:10000 -f create_tables.hql

Bulk hive table creation in Google Dataproc