BOINC 和 Hadoop/Spark/etc 的区别

Difference between BOINC and Hadoop/Spark/etc

BOINC有什么区别https://en.wikipedia.org/wiki/Berkeley_Open_Infrastructure_for_Network_Computing

对比一般Hadoop/Spark/etc。大数据框架？它们似乎都是分布式计算框架 - 有什么地方可以让我了解差异或特别是 BOINC 吗？

欧盟的大型强子对撞机似乎在使用 BOINC，为什么不用 Hadoop？

谢谢。

BOINC is software that can use the unused CPU and GPU cycles on a computer to do scientific computing

BOINC 严格来说是一个单一的应用程序，可以使用未使用的计算周期进行网格计算。

Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework.

The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part called MapReduce.

（强调添加到框架及其双重功能）

在这里，你看到Hadoop是一个兼具存储和计算能力的框架（也称为生态系统）。 Cloudera 和 Hortonworks 等 Hadoop 供应商将附加功能捆绑到其中（Hive、Hbase、Pig、Spark 等）以及一些 security/auditing 工具。

此外，这两个集群处理硬件故障的方式不同。 BOINC节点死亡，无故障容忍运行ce；这些资源丢失了。在 Hadoop 的情况下，数据被复制并且任务在最终失败之前被重新运行一定次数，但只要框架中内置的日志服务是运行，这些步骤是可追踪的。

Seems the Large Hadron Collider in EU is using BOINC, why not Hadoop?

因为 BOINC 提供了世界上任何人都可以安装加入集群的软件，他们几乎可以从任何地方免费获得大量运行ge 计算能力。

他们可能在内部使用 Hadoop 来做一些存储，也许 Spark 来做额外的计算，但是批量购买商品硬件并且 building/maintaining 该集群似乎成本过高。

BOINC和Hadoop的相似之处在于，他们利用了一个大问题可以分多部分解决。两者都与在多台计算机上分发数据最相关，而不是 应用程序。

不同之处在于所有参与机器之间的同步程度。使用 Hadoop，同步非常紧密，您希望在某个时候从所有机器收集所有数据，然后进行最终分析。您实际上是在等待最后一个，并且在作业的最后一部分完成之前不会返回任何内容。

使用BOINC，完全没有同步性。你有成千上万的工作要做运行。项目维护者的 BOINC 服务器端运行协调志愿者向运行 BOINC 客户端运行的作业交付。

使用 BOINC，项目维护者根本无法控制客户。如果客户端未返回结果，则工作单元将再次发送到其他地方。使用 Hadoop，项目维护者可以访问整个集群。使用 BOINC，可以跨不同平台提供应用程序，因为完全不确定用户提供什么平台。在 Hadoop 中，一切都是明确定义的，而且通常非常同质。 BOINC 最大的项目有数以万计的常规志愿者，Hadoop 有你买得起或租得起的东西。

BOINC 和 Hadoop/Spark/etc 的区别

Difference between BOINC and Hadoop/Spark/etc

hadoop

distributed-computing

bigdata

boinc