运行 AWS EMR 主节点上的 cron 任务

Run cron task on AWS EMR master node

如何 运行 在 EMR 集群的后台定期作业? 我有 script.sh 的 cron 作业和 application.py 在 s3 中,想 运行 使用此命令集群:

aws emr create-cluster 
--name "Test cluster"
–-release-label emr-5.12.0 
--applications Name=Hive Name=Pig Name=Ganglia Name=Spark
--use-default-roles 
--ec2-attributes KeyName=myKey 
--instance-type m3.xlarge 
--instance-count 3 
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,
Args["s3://mybucket/script-path/script.sh"]

最后,我希望 script.sh 的 cron 作业执行 application.py 现在我不明白如何在主节点上安装 cron,python 文件需要一些库,应该将它们安装到。

linux系统默认安装crontab,无需手动安装。

要在 cron 中添加 spark 作业调度,请按照以下步骤操作

  • 登录到主节点(SSH 到主节点)。
  • 运行 命令

crontab -e

  • 在 crontab 中添加以下行并保存 (:w)

    */15 0 * * * /script-path/script.sh

    现在 cron 将每 15 分钟安排一次作业。

请参阅此 link 了解 cron。

希望对您有所帮助。

谢谢 拉维

您需要通过 SSH 连接到主节点,然后从那里执行 crontab 设置,而不是在您的本地计算机上:

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node-ssh.html

Connect to the Master Node Using SSH

Secure Shell (SSH) is a network protocol you can use to create a secure connection to a remote computer. After you make a connection, the terminal on your local computer behaves as if it is running on the remote computer. Commands you issue locally run on the remote computer, and the command output from the remote computer appears in your terminal window.

When you use SSH with AWS, you are connecting to an EC2 instance, which is a virtual server running in the cloud. When working with Amazon EMR, the most common use of SSH is to connect to the EC2 instance that is acting as the master node of the cluster.

Using SSH to connect to the master node gives you the ability to monitor and interact with the cluster. You can issue Linux commands on the master node, run applications such as Hive and Pig interactively, browse directories, read log files, and so on. You can also create a tunnel in your SSH connection to view the web interfaces hosted on the master node. For more information, see View Web Interfaces Hosted on Amazon EMR Clusters.