是否有类似于 Hadoop Streaming 的 Apache Spark 对应物？

Is there any Apache Spark counterpart similar to Hadoop Streaming?

hadoop
mapreduce
hadoop-streaming
apache-spark

我想用 C++ 实现一些高度自定义的处理逻辑。 Hadoop Streaming 使我能够将 C++ 编码的逻辑集成到 MapReduce 处理管道中。我想知道我是否可以用 Apache Spark 做同样的事情。

最接近（但不完全等价）的解决方案是RDD.pipe方法：

Return an RDD created by piping elements to a forked external process. The resulting RDD is computed by executing the given process once per partition. All elements of each input partition are written to a process's stdin as lines of input separated by a newline. The resulting partition consists of the process's stdout output, with each line of stdout resulting in one element of the output partition. A process is invoked even for empty partitions.

The print behavior can be customized by providing two functions.

Spark test suite 提供了许多使用示例。

是否有类似于 Hadoop Streaming 的 Apache Spark 对应物？

Is there any Apache Spark counterpart similar to Hadoop Streaming?

hadoop

mapreduce

hadoop-streaming

apache-spark