'Broken pipe' 异常会取消我的工作吗?
Does a 'Broken pipe' exception cancel my job?
目前我是 运行 一个使用 144 个 TaskSlots 的 4 台机器远程集群上的 Flink 程序。 运行 大约 30 分钟后,我收到以下错误:
INFO
org.apache.flink.runtime.jobmanager.web.JobManagerInfoServlet - Info
server for jobmanager: Failed to write json updates for job
b2eaff8539c8c9b696826e69fb40ca14, because
org.eclipse.jetty.io.RuntimeIOException:
org.eclipse.jetty.io.EofException at
org.eclipse.jetty.io.UncheckedPrintWriter.setError(UncheckedPrintWriter.java:107)
at
org.eclipse.jetty.io.UncheckedPrintWriter.write(UncheckedPrintWriter.java:280)
at
org.eclipse.jetty.io.UncheckedPrintWriter.write(UncheckedPrintWriter.java:295)
at
org.apache.flink.runtime.jobmanager.web.JobManagerInfoServlet.writeJsonUpdatesForJob(JobManagerInfoServlet.java:588)
at
org.apache.flink.runtime.jobmanager.web.JobManagerInfoServlet.doGet(JobManagerInfoServlet.java:209)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:734) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:847) at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
at org.eclipse.jetty.server.Server.handle(Server.java:352) at
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
at
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
at
org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.run(QueuedThreadPool.java:436)
at java.lang.Thread.run(Thread.java:745) Caused by:
org.eclipse.jetty.io.EofException at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:905)
at
org.eclipse.jetty.http.AbstractGenerator.flush(AbstractGenerator.java:427)
at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:78) at
org.eclipse.jetty.server.HttpConnection$Output.flush(HttpConnection.java:1139)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:159) at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:86) at
java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:154)
at org.eclipse.jetty.server.HttpWriter.write(HttpWriter.java:258) at
org.eclipse.jetty.server.HttpWriter.write(HttpWriter.java:107) at
org.eclipse.jetty.io.UncheckedPrintWriter.write(UncheckedPrintWriter.java:271)
... 24 more Caused by: java.io.IOException: Broken pipe at
sun.nio.ch.FileDispatcherImpl.write0(Native Method) at
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at
sun.nio.ch.IOUtil.write(IOUtil.java:51) at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) at
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:185)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:256)
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:849)
... 33 more
我知道 java.io.IOException: Broken pipe
意味着 JobManager 丢失了某种连接,所以我猜整个作业都失败了,我必须重新启动它。尽管我认为该过程不再是 运行,但 WebInterface 仍将其列为 运行。此外,当我使用 jps
来识别集群上的 运行 进程时,JobManager 仍然存在。所以我的问题是,如果我的工作丢失了,这个错误是有时随机发生还是我的程序引起的。
编辑:我的 TaskManager 仍然每隔几秒发送一次心跳并且似乎是 运行。
实际上是 Flink 的 Web 服务器 JobManagerInfoServlet
的问题,由于 java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
,它无法将所请求作业的最新 JSON 更新发送到您的浏览器。因此,只有对服务器的 GET 请求失败。
这样的故障应该不会影响当前运行 Flink作业的执行。只需刷新您的浏览器(使用 Flink 的网络 UI)应该会发送另一个 GET 请求,然后有望成功完成。
目前我是 运行 一个使用 144 个 TaskSlots 的 4 台机器远程集群上的 Flink 程序。 运行 大约 30 分钟后,我收到以下错误:
INFO org.apache.flink.runtime.jobmanager.web.JobManagerInfoServlet - Info server for jobmanager: Failed to write json updates for job b2eaff8539c8c9b696826e69fb40ca14, because org.eclipse.jetty.io.RuntimeIOException: org.eclipse.jetty.io.EofException at org.eclipse.jetty.io.UncheckedPrintWriter.setError(UncheckedPrintWriter.java:107) at org.eclipse.jetty.io.UncheckedPrintWriter.write(UncheckedPrintWriter.java:280) at org.eclipse.jetty.io.UncheckedPrintWriter.write(UncheckedPrintWriter.java:295) at org.apache.flink.runtime.jobmanager.web.JobManagerInfoServlet.writeJsonUpdatesForJob(JobManagerInfoServlet.java:588) at org.apache.flink.runtime.jobmanager.web.JobManagerInfoServlet.doGet(JobManagerInfoServlet.java:209) at javax.servlet.http.HttpServlet.service(HttpServlet.java:734) at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) at org.eclipse.jetty.util.thread.QueuedThreadPool.run(QueuedThreadPool.java:436) at java.lang.Thread.run(Thread.java:745) Caused by: org.eclipse.jetty.io.EofException at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:905) at org.eclipse.jetty.http.AbstractGenerator.flush(AbstractGenerator.java:427) at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:78) at org.eclipse.jetty.server.HttpConnection$Output.flush(HttpConnection.java:1139) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:159) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:86) at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:154) at org.eclipse.jetty.server.HttpWriter.write(HttpWriter.java:258) at org.eclipse.jetty.server.HttpWriter.write(HttpWriter.java:107) at org.eclipse.jetty.io.UncheckedPrintWriter.write(UncheckedPrintWriter.java:271) ... 24 more Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) at org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:185) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:256) at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:849) ... 33 more
我知道 java.io.IOException: Broken pipe
意味着 JobManager 丢失了某种连接,所以我猜整个作业都失败了,我必须重新启动它。尽管我认为该过程不再是 运行,但 WebInterface 仍将其列为 运行。此外,当我使用 jps
来识别集群上的 运行 进程时,JobManager 仍然存在。所以我的问题是,如果我的工作丢失了,这个错误是有时随机发生还是我的程序引起的。
编辑:我的 TaskManager 仍然每隔几秒发送一次心跳并且似乎是 运行。
实际上是 Flink 的 Web 服务器 JobManagerInfoServlet
的问题,由于 java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
,它无法将所请求作业的最新 JSON 更新发送到您的浏览器。因此,只有对服务器的 GET 请求失败。
这样的故障应该不会影响当前运行 Flink作业的执行。只需刷新您的浏览器(使用 Flink 的网络 UI)应该会发送另一个 GET 请求,然后有望成功完成。