运行 与 Mesos 对接的 Spark 破坏了 Mesos
Running dockered Spark with Mesos breaks Mesos
我试图在 DC/OS 上使用 Mesos 上的 Spark 运行 Jupyter Notebook docker(Ubuntu 16.04 版本)。 Python 输出了很多无用的错误消息,但是在连接到容器并尝试从容器 运行 spark-submit 作业后,我收到了很多关于连接问题的错误。
Spark driver 无法正常连接到Mesos,似乎大多数情况下设置LIBPROCESS_IP
就足够了。但是,在我的情况下,使用它会完全挂起 Mesos。
这就是我在 运行 里面 docker 容器:
export LIBPROCESS_ADVERTISE_IP=172.16.6.105; export SPARK_HOME=spark-2.3.2-bin-hadoop2.6; export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64; export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so; export LIBPROCESS_IP=172.19.0.4; ./spark-2.3.2-bin-hadoop2.6/bin/spark-submit --master mesos://leader.mesos:5050 --class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.0.1.jar 30
Spark 驱动程序在这部分挂起:
I0312 07:18:13.722151 3764 sched.cpp:232] Version: 1.2.3
I0312 07:18:13.732707 3758 sched.cpp:336] New master detected at master@172.16.6.103:5050
I0312 07:18:13.733749 3758 sched.cpp:352] No credentials provided. Attempting to register without authentication
这一步,Mesos挂了。根本无法访问 UI,DCOS 启动后检查显示错误。
我检查了 Mesos 日志,这是我看到的:
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911664 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911737 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911801 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911841 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912062 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912149 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912243 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912281 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912369 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912441 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912499 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912534 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912771 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912860 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912921 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912957 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
我有时也会看到这个:
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638309 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638342 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638381 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638442 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638475 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638514 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638572 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638605 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638644 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638715 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638751 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638790 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638847 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638881 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638921 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638978 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.639011 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.639060 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.639118 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.639153 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
这一直在重复。当我停止驱动程序时,Mesos 仍然损坏并且一直输出这些消息:
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871507 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871595 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871671 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871744 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871811 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871911 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871979 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.872048 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.872140 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
所以看起来 Spark Driver 正在用 Subscribe 调用向 Mesos 发送垃圾邮件,以至于 Mesos 无法跟上处理它们的速度。尝试了 Spark 2.3.2 和 2.4.0,结果相同。
我尝试将 Spark 连接到 Spark Mesos Dispatcher,但是即使设置了这些 LIBPROCESS 变量,我也会遇到常见的连接错误:
E0312 08:01:55.658208 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.658838 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.659353 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.660073 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.660650 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.661358 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.662775 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.663313 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.663964 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.664711 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
有人遇到过这样的问题吗?我该如何解决?
我 运行 使用 docker compose 在 mesos 上激发。我已经有一个 docker 图像,上面安装了 Mesos 并配置了 mesos 集群,例如,我确定了 Master 和 workers。然后,我写了这些 docker compose for master 和 slaves。他们工作没有错误。
写作高手:
version: '3.7'
services:
master:
image: ubuntu_mesos_spark
command: bash -c "sleep 40; /home/mesos-1.7.0/build/bin/mesos-master.sh --ip=150.20.11.136 --work_dir=/var/run/mesos --hostname=x.x.x.x" ##hostname : IP of the master node
restart: always
network_mode: host
environment:
- MESOS_HOSTNAME="150.20.11.136"
- MESOS_QUORUM=1
- MESOS_LOG_DIR=/var/log/mesos
expose:
- 5050
- 4040
- 7077
- 8080
ports:
- 5050:5050
- 4040:4040
- 7077:7077
- 8080:8080
组成奴隶:
version: '3.7'
services:
slave:
image: ubuntu_mesos_spark
command: bash -c "sleep 40; /home/mesos-1.7.0/build/bin/mesos-slave.sh
--master=150.20.11.136:5050 --work_dir=/var/run/mesos
--systemd_enable_support=false"
restart: always
privileged: true
network_mode: host
environment:
- MESOS_HOSTNAME="150.20.11.157"
- MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins #also in Dockerfile
- MESOS_LOG_DIR=/var/log/mesos
- MESOS_LOGGING_LEVEL=INFO
expose:
- 5051
ports:
- 5051:5051
我希望这有用。
我试图在 DC/OS 上使用 Mesos 上的 Spark 运行 Jupyter Notebook docker(Ubuntu 16.04 版本)。 Python 输出了很多无用的错误消息,但是在连接到容器并尝试从容器 运行 spark-submit 作业后,我收到了很多关于连接问题的错误。
Spark driver 无法正常连接到Mesos,似乎大多数情况下设置LIBPROCESS_IP
就足够了。但是,在我的情况下,使用它会完全挂起 Mesos。
这就是我在 运行 里面 docker 容器:
export LIBPROCESS_ADVERTISE_IP=172.16.6.105; export SPARK_HOME=spark-2.3.2-bin-hadoop2.6; export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64; export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so; export LIBPROCESS_IP=172.19.0.4; ./spark-2.3.2-bin-hadoop2.6/bin/spark-submit --master mesos://leader.mesos:5050 --class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.0.1.jar 30
Spark 驱动程序在这部分挂起:
I0312 07:18:13.722151 3764 sched.cpp:232] Version: 1.2.3
I0312 07:18:13.732707 3758 sched.cpp:336] New master detected at master@172.16.6.103:5050
I0312 07:18:13.733749 3758 sched.cpp:352] No credentials provided. Attempting to register without authentication
这一步,Mesos挂了。根本无法访问 UI,DCOS 启动后检查显示错误。
我检查了 Mesos 日志,这是我看到的:
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911664 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911737 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911801 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911841 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912062 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912149 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912243 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912281 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912369 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912441 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912499 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912534 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912771 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912860 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912921 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912957 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
我有时也会看到这个:
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638309 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638342 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638381 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638442 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638475 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638514 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638572 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638605 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638644 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638715 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638751 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638790 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638847 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638881 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638921 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638978 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.639011 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.639060 855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.639118 855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [ ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.639153 855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
这一直在重复。当我停止驱动程序时,Mesos 仍然损坏并且一直输出这些消息:
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871507 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871595 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871671 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871744 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871811 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871911 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871979 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.872048 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.872140 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
所以看起来 Spark Driver 正在用 Subscribe 调用向 Mesos 发送垃圾邮件,以至于 Mesos 无法跟上处理它们的速度。尝试了 Spark 2.3.2 和 2.4.0,结果相同。
我尝试将 Spark 连接到 Spark Mesos Dispatcher,但是即使设置了这些 LIBPROCESS 变量,我也会遇到常见的连接错误:
E0312 08:01:55.658208 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.658838 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.659353 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.660073 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.660650 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.661358 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.662775 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.663313 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.663964 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.664711 4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
有人遇到过这样的问题吗?我该如何解决?
我 运行 使用 docker compose 在 mesos 上激发。我已经有一个 docker 图像,上面安装了 Mesos 并配置了 mesos 集群,例如,我确定了 Master 和 workers。然后,我写了这些 docker compose for master 和 slaves。他们工作没有错误。
写作高手:
version: '3.7'
services:
master:
image: ubuntu_mesos_spark
command: bash -c "sleep 40; /home/mesos-1.7.0/build/bin/mesos-master.sh --ip=150.20.11.136 --work_dir=/var/run/mesos --hostname=x.x.x.x" ##hostname : IP of the master node
restart: always
network_mode: host
environment:
- MESOS_HOSTNAME="150.20.11.136"
- MESOS_QUORUM=1
- MESOS_LOG_DIR=/var/log/mesos
expose:
- 5050
- 4040
- 7077
- 8080
ports:
- 5050:5050
- 4040:4040
- 7077:7077
- 8080:8080
组成奴隶:
version: '3.7'
services:
slave:
image: ubuntu_mesos_spark
command: bash -c "sleep 40; /home/mesos-1.7.0/build/bin/mesos-slave.sh
--master=150.20.11.136:5050 --work_dir=/var/run/mesos
--systemd_enable_support=false"
restart: always
privileged: true
network_mode: host
environment:
- MESOS_HOSTNAME="150.20.11.157"
- MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins #also in Dockerfile
- MESOS_LOG_DIR=/var/log/mesos
- MESOS_LOGGING_LEVEL=INFO
expose:
- 5051
ports:
- 5051:5051
我希望这有用。