抢占式 GCE 实例上的 Presto

Presto on Preemptible GCE instances

我是运行一个20个Preemptible GCE实例的实例组,用于读取Google存储上的ORC文件,数据按小时分区,每小时约2GB。

  1. 我应该使用什么类型的实例?
  2. JVM 应该使用多少 Ram?
  3. 我正在使用 80% CPU 和 10 分钟冷却时间的自动缩放配置,Presto 是否有更多字幕配置?
  4. 是否有解决服务器因资源不足而关闭的解决方案?

也将不胜感激。

作为 PrestoDB 的 0.199 版本,Presto 没有 google 云存储连接器,因此无法查询 GCS 数据。

关于硬件要求,我会在这里引用Terada doc

Memory

You should allocate a minimum of 16GB of RAM per node for Presto. But recommend 64GB for most production workloads.

Network Bandwidth

It is recommended to have 10 Gigabit Ethernet between all the nodes in the cluster.

Other Recommendations

Presto can be installed on any normally configured Hadoop cluster. YARN should be configured to account for resources dedicated to Presto. For example, if a node has 64GB of RAM, perhaps you would normally allocate 60GB to YARN. If you install Presto on that node and give Presto 32GB of RAM, then you should subtract 32GB from the 60GB and let YARN only allocate 28GB per node. An optimized configuration might choose to have separate Presto and Hadoop nodes. The optimized configuration allows you to give more memory to Presto, and thus perform larger join queries, for example.