如何减少 Spark 运行时输出的冗长程度?
How to reduce the verbosity of Spark's runtime output?
如何减少 Spark 运行时产生的跟踪信息量?
默认太冗长,
如何关闭它,并在我需要时打开它。
谢谢
详细模式
scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
15/01/28 09:57:24 INFO SparkContext: Starting job: collect at <console>:15
15/01/28 09:57:24 INFO DAGScheduler: Got job 3 (collect at <console>:15) with 1 output
...
15/01/28 09:57:24 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/01/28 09:57:24 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 626 bytes result sent to driver
15/01/28 09:57:24 INFO DAGScheduler: Stage 3 (collect at <console>:15) finished in 0.002 s
15/01/28 09:57:24 INFO DAGScheduler: Job 3 finished: collect at <console>:15, took 0.020061 s
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)
静默模式(预期)
scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)
引自“Learning Spark”一书。
You may find the logging statements that get printed in the shell
distracting. You can control the verbosity of the logging. To do this,
you can create a file in the conf directory called log4j.properties.
The Spark developers already include a template for this file called
log4j.properties.template. To make the logging less verbose, make a
copy of conf/log4j.properties.template called conf/log4j.properties
and find the following line:
log4j.rootCategory=INFO, console
Then
lower the log level so that we only show WARN message and above by
changing it to the following:
log4j.rootCategory=WARN, console
When
you re-open the shell, you should see less output.
Spark 1.4.1
sc.setLogLevel("WARN")
来自源代码中的注释:
Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
Spark 2.x - 2.3.1
sparkSession.sparkContext().setLogLevel("WARN")
Spark 2.3.2
sparkSession.sparkContext.setLogLevel("WARN")
在 Spark 应用级别记录配置
使用这种方法不需要在集群中更改代码 用于 spark 应用程序。
- 让我们从 log4j.properties.template.
创建一个新文件 log4j.properties
- 然后使用
log4j.rootCategory
属性 更改详细程度。
- 说,我们需要检查给定 jar 的 ERROR,
log4j.rootCategory=ERROR, console
Spark 提交命令将是
spark-submit \
... #Other spark props goes here
--files prop/file/location \
--conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
--conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
jar/location \
[application arguments]
现在您只会看到错误分类的日志。
Spark 的普通 Log4j 方式(但需要更改代码)
为包 org
和 akka
设置日志记录 OFF
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)
如果您从 shell 调用命令,则 a lot you can do 无需更改任何配置。这是设计使然。
下面是几个使用管道的 Unix 示例,但您可以在其他环境中使用类似的过滤器。
完全静音日志(风险自负)
通过管道将 stderr 传输到 /dev/null
,即:
run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null
忽略 INFO
条消息
run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 | awk '{if ( != "INFO") print [=13=]}'
如何减少 Spark 运行时产生的跟踪信息量?
默认太冗长,
如何关闭它,并在我需要时打开它。
谢谢
详细模式
scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
15/01/28 09:57:24 INFO SparkContext: Starting job: collect at <console>:15
15/01/28 09:57:24 INFO DAGScheduler: Got job 3 (collect at <console>:15) with 1 output
...
15/01/28 09:57:24 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/01/28 09:57:24 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 626 bytes result sent to driver
15/01/28 09:57:24 INFO DAGScheduler: Stage 3 (collect at <console>:15) finished in 0.002 s
15/01/28 09:57:24 INFO DAGScheduler: Job 3 finished: collect at <console>:15, took 0.020061 s
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)
静默模式(预期)
scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)
引自“Learning Spark”一书。
You may find the logging statements that get printed in the shell distracting. You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j.properties. The Spark developers already include a template for this file called log4j.properties.template. To make the logging less verbose, make a copy of conf/log4j.properties.template called conf/log4j.properties and find the following line:
log4j.rootCategory=INFO, console
Then lower the log level so that we only show WARN message and above by changing it to the following:
log4j.rootCategory=WARN, console
When you re-open the shell, you should see less output.
Spark 1.4.1
sc.setLogLevel("WARN")
来自源代码中的注释:
Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
Spark 2.x - 2.3.1
sparkSession.sparkContext().setLogLevel("WARN")
Spark 2.3.2
sparkSession.sparkContext.setLogLevel("WARN")
在 Spark 应用级别记录配置
使用这种方法不需要在集群中更改代码 用于 spark 应用程序。
- 让我们从 log4j.properties.template. 创建一个新文件 log4j.properties
- 然后使用
log4j.rootCategory
属性 更改详细程度。 - 说,我们需要检查给定 jar 的 ERROR,
log4j.rootCategory=ERROR, console
Spark 提交命令将是
spark-submit \
... #Other spark props goes here
--files prop/file/location \
--conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
--conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
jar/location \
[application arguments]
现在您只会看到错误分类的日志。
Spark 的普通 Log4j 方式(但需要更改代码)
为包 org
和 akka
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)
如果您从 shell 调用命令,则 a lot you can do 无需更改任何配置。这是设计使然。
下面是几个使用管道的 Unix 示例,但您可以在其他环境中使用类似的过滤器。
完全静音日志(风险自负)
通过管道将 stderr 传输到 /dev/null
,即:
run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null
忽略 INFO
条消息
run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 | awk '{if ( != "INFO") print [=13=]}'