每当事件发生时触发 spark 作业

Question

我有一个 spark 应用程序，每当它收到关于某个主题的 kafka 消息时，它应该运行。

我一天不会收到超过 5-6 条消息，所以我不想采用 Spark Streaming 方法。相反，我尝试使用 SparkLauncher 提交应用程序，但我不喜欢这种方法，因为我必须在我的代码中以编程方式设置 spark 和 Java 类路径以及所有必要的 spark 属性，如执行器核心、执行器内存等

如何将 spark 应用程序从 spark-submit 触发到运行但让它等待直到收到消息？

任何指点都非常有帮助。

Answer 1

您可以使用 shell 脚本方法和 nohup 命令来提交这样的作业...

"nohup spark-submit shell script <parameters> 2>&1 < /dev/null &"

只要您收到消息，您就可以轮询该事件并调用此 shell 脚本。

下面是执行此操作的代码片段...进一步查看 https://en.wikipedia.org/wiki/Nohup

- 使用 `RunTime`

/**
     * This method is to spark submit
     * <pre> You can call spark-submit or mapreduce job on the fly like this.. by calling shell script... </pre>
     * @param commandToExecute String 
     */
    public static Boolean executeCommand(final String commandToExecute) {
        try {
            final Runtime rt = Runtime.getRuntime();
            // LOG.info("process command -- " + commandToExecute);
            final String[] arr = { "/bin/sh", "-c", commandToExecute};
            final Process proc = rt.exec(arr);
            // LOG.info("process started ");
            final int exitVal = proc.waitFor();
            LOG.trace(" commandToExecute exited with code: " + exitVal);
            proc.destroy();

        } catch (final Exception e) {
            LOG.error("Exception occurred while Launching process : " + e.getMessage());
            return Boolean.FALSE;
        }
             return Boolean.TRUE;
    }

- 使用 `ProcessBuilder` - 另一种方式

private static void executeProcess(Operation command, String database) throws IOException,
            InterruptedException {

        final File executorDirectory = new File("src/main/resources/");

private final static String shellScript = "./sparksubmit.sh";
ProcessBuilder processBuilder = new ProcessBuilder(shellScript, command.getOperation(), "argument-one");

        processBuilder.directory(executorDirectory);
          Process process = processBuilder.start();
          try {
            int shellExitStatus = process.waitFor();
            if (shellExitStatus != 0) {
                logger.info("Successfully executed the shell script");
            }
        } catch (InterruptedException ex) {
            logger.error("Shell Script process was interrupted");
        }
      }

- 第三种方式：jsch

Run a command over SSH with JSch

- `YarnClient` class -fourth way

我最喜欢的一本书数据算法使用了这种方法

// import required classes and interfaces
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkConf;

public class SubmitSparkJobToYARNFromJavaCode {

   public static void main(String[] arguments) throws Exception {

       // prepare arguments to be passed to 
       // org.apache.spark.deploy.yarn.Client object
       String[] args = new String[] {
           // the name of your application
           "--name",
           "myname",

           // memory for driver (optional)
           "--driver-memory",
           "1000M",

           // path to your application's JAR file 
           // required in yarn-cluster mode      
           "--jar",
           "/Users/mparsian/zmp/github/data-algorithms-book/dist/data_algorithms_book.jar",

           // name of your application's main class (required)
           "--class",
           "org.dataalgorithms.bonus.friendrecommendation.spark.SparkFriendRecommendation",

           // comma separated list of local jars that want 
           // SparkContext.addJar to work with      
           "--addJars",
           "/Users/mparsian/zmp/github/data-algorithms-book/lib/spark-assembly-1.5.2-hadoop2.6.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/log4j-1.2.17.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/junit-4.12-beta-2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jsch-0.1.42.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/JeraAntTasks.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jedis-2.5.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jblas-1.2.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/hamcrest-all-1.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/guava-18.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math3-3.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math-2.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-logging-1.1.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang3-3.4.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang-2.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-io-2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-httpclient-3.0.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-daemon-1.0.5.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-configuration-1.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-collections-3.2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-cli-1.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/cloud9-1.3.2.jar",

           // argument 1 to your Spark program (SparkFriendRecommendation)
           "--arg",
           "3",

           // argument 2 to your Spark program (SparkFriendRecommendation)
           "--arg",
           "/friends/input",

           // argument 3 to your Spark program (SparkFriendRecommendation)
           "--arg",
           "/friends/output",

           // argument 4 to your Spark program (SparkFriendRecommendation)
           // this is a helper argument to create a proper JavaSparkContext object
           // make sure that you create the following in SparkFriendRecommendation program
           // ctx = new JavaSparkContext("yarn-cluster", "SparkFriendRecommendation");
           "--arg",
           "yarn-cluster"
       };

       // create a Hadoop Configuration object
       Configuration config = new Configuration();

       // identify that you will be using Spark as YARN mode
       System.setProperty("SPARK_YARN_MODE", "true");

       // create an instance of SparkConf object
       SparkConf sparkConf = new SparkConf();

       // create ClientArguments, which will be passed to Client
       ClientArguments cArgs = new ClientArguments(args, sparkConf); 

       // create an instance of yarn Client client
       Client client = new Client(cArgs, config, sparkConf); 

       // submit Spark job to YARN
       client.run(); 
   }
}

每当事件发生时触发 spark 作业

Trigger a spark job whenever an event occurs

java

runtime.exec

apache-kafka

apache-spark

kafka-consumer-api

- 使用 `RunTime`

- 使用 `ProcessBuilder` - 另一种方式

- 第三种方式：jsch

- `YarnClient` class -fourth way

每当事件发生时触发 spark 作业

Trigger a spark job whenever an event occurs

java

runtime.exec

apache-kafka

apache-spark

kafka-consumer-api

- 使用 RunTime

- 使用 ProcessBuilder - 另一种方式

- 第三种方式：jsch

- YarnClient class -fourth way

- 使用 `RunTime`

- 使用 `ProcessBuilder` - 另一种方式

- `YarnClient` class -fourth way