如何在使用 py4j 记录器时在 pyspark 中打印堆栈跟踪?
How to print stacktrace in pyspark while using py4j logger?
我想打印在执行 pyspark 代码期间引发的异常的堆栈跟踪。我正在使用 pyspark (py4j) 的本机记录器,但在执行 logger.exception()
时它失败了。我也试过logger.error()
log4jLogger = sc._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger(__name__)
logger.info('first statement.')
try:
raise Exception('Dummy exception')
except Exception as e:
logger.error('Something awful happened') # Does not print stacktrace
logger.exception('Something awful happened') # print stacktrace but crashes
logger.info('Importing module.')
输出如下:
21/12/08 11:50:56 INFO __main__: First statement
21/12/08 11:50:56 ERROR __main__: Something awful happened
Traceback (most recent call last):
File "/home/gaurav.gupta/projects/PoCs/brandMention/pyspark-scripts/dist/main.py", line 94, in <module>
raise Exception('Dummy exception')
Exception: Dummy exception
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/gaurav.gupta/projects/PoCs/brandMention/pyspark-scripts/dist/main.py", line 97, in <module>
logger.exception('Something awful happened')
File "/home/gaurav.gupta/miniconda3/envs/venv_pyspark/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/home/gaurav.gupta/miniconda3/envs/venv_pyspark/lib/python3.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/home/gaurav.gupta/miniconda3/envs/venv_pyspark/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o37.exception. Trace:
py4j.Py4JException: Method exception([class java.lang.String]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
21/12/08 11:50:56 INFO SparkUI: Stopped Spark web UI at http://192.168.1.13:4040
21/12/08 11:50:56 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
我已经找到解决该问题的方法。我正在获取字符串化格式的堆栈跟踪,然后像往常一样打印它。
import traceback
strace = ''.join(traceback.format_exception(etype=type(ex), value=ex, tb=ex.__traceback__))
logger.error(strace)
我想打印在执行 pyspark 代码期间引发的异常的堆栈跟踪。我正在使用 pyspark (py4j) 的本机记录器,但在执行 logger.exception()
时它失败了。我也试过logger.error()
log4jLogger = sc._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger(__name__)
logger.info('first statement.')
try:
raise Exception('Dummy exception')
except Exception as e:
logger.error('Something awful happened') # Does not print stacktrace
logger.exception('Something awful happened') # print stacktrace but crashes
logger.info('Importing module.')
输出如下:
21/12/08 11:50:56 INFO __main__: First statement
21/12/08 11:50:56 ERROR __main__: Something awful happened
Traceback (most recent call last):
File "/home/gaurav.gupta/projects/PoCs/brandMention/pyspark-scripts/dist/main.py", line 94, in <module>
raise Exception('Dummy exception')
Exception: Dummy exception
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/gaurav.gupta/projects/PoCs/brandMention/pyspark-scripts/dist/main.py", line 97, in <module>
logger.exception('Something awful happened')
File "/home/gaurav.gupta/miniconda3/envs/venv_pyspark/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/home/gaurav.gupta/miniconda3/envs/venv_pyspark/lib/python3.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/home/gaurav.gupta/miniconda3/envs/venv_pyspark/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o37.exception. Trace:
py4j.Py4JException: Method exception([class java.lang.String]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
21/12/08 11:50:56 INFO SparkUI: Stopped Spark web UI at http://192.168.1.13:4040
21/12/08 11:50:56 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
我已经找到解决该问题的方法。我正在获取字符串化格式的堆栈跟踪,然后像往常一样打印它。
import traceback
strace = ''.join(traceback.format_exception(etype=type(ex), value=ex, tb=ex.__traceback__))
logger.error(strace)