H2O with R：内存需求

H2O with R: Memory Requirement

我一直在研究 H20 机器学习平台，并试图弄清楚它是否与 R 一起使用允许 R 处理非常大的数据（>> 笔记本电脑上的可用 RAM）或者它是否是仍然受 RAM 数量的限制？我想既然是"in-memory"，这意味着它仍然需要非常大量的RAM或服务器集群？有人有这方面的经验吗？

是的，h20 是一个 in-memory architecture，因此受到物理内存的限制。他们确实支持大约 15 种不同的压缩方案，包括旨在压缩稀疏数据的方案。

他们说一些流媒体支持是 "on the roadmap but not implemented yet"。

如果你的数据集不适合，并且你不能更有效地压缩或编码你的数据类型（因子、逻辑、分割成范围、文本预处理），那么你将需要一个大集群或大云实例。

此外，仅供参考 support for R 只是一个子集：

A note on R: H2O supports an R-like language - not full R semantics - but the obviously data-parallel data-munging aspects of R, and of course all the operators run fully parallel and distributed. There is a REPL. You can use it to add or drop columns or rows, manufacture features, impute missing values, or drop-in many R-expressions and have them run at-scale.

例如尽可能使用他们的预烘焙算法（高性能原生 Java 实现）而不是通用的 R 算法代码。

您需要原型制作还是生产？您可能会问他们在 R-H2O 的生产中是否有任何参考客户。

H2O with R：内存需求

H2O with R: Memory Requirement

r

in-memory

h2o