如何将 javaagent 传递给 emr spark 应用程序?
how to pass javaagent to emr spark applications?
我正在尝试使用 uber jvm profiler 来分析我的 spark 应用程序(spark 2.4,运行ning on emr 5.21)
以下是我的集群配置
[
{
"classification": "spark-defaults",
"properties": {
"spark.executor.memory": "38300M",
"spark.driver.memory": "38300M",
"spark.yarn.scheduler.reporterThread.maxFailures": "5",
"spark.driver.cores": "5",
"spark.yarn.driver.memoryOverhead": "4255M",
"spark.executor.heartbeatInterval": "60s",
"spark.rdd.compress": "true",
"spark.network.timeout": "800s",
"spark.executor.cores": "5",
"spark.memory.storageFraction": "0.27",
"spark.speculation": "true",
"spark.sql.shuffle.partitions": "200",
"spark.shuffle.spill.compress": "true",
"spark.shuffle.compress": "true",
"spark.storage.level": "MEMORY_AND_DISK_SER",
"spark.default.parallelism": "200",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.memory.fraction": "0.80",
"spark.executor.extraJavaOptions": "-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:OnOutOfMemoryError='kill -9 %p'",
"spark.executor.instances": "107",
"spark.yarn.executor.memoryOverhead": "4255M",
"spark.dynamicAllocation.enabled": "false",
"spark.driver.extraJavaOptions": "-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:OnOutOfMemoryError='kill -9 %p'"
},
"configurations": []
},
{
"classification": "yarn-site",
"properties": {
"yarn.log-aggregation-enable": "true",
"yarn.nodemanager.pmem-check-enabled": "false",
"yarn.nodemanager.vmem-check-enabled": "false"
},
"configurations": []
},
{
"classification": "spark",
"properties": {
"maximizeResourceAllocation": "true",
"spark.sql.broadcastTimeout": "-1"
},
"configurations": []
},
{
"classification": "emrfs-site",
"properties": {
"fs.s3.threadpool.size": "50",
"fs.s3.maxConnections": "5000"
},
"configurations": []
},
{
"classification": "core-site",
"properties": {
"fs.s3.threadpool.size": "50",
"fs.s3.maxConnections": "5000"
},
"configurations": []
}
]
探查器 jar 存储在 s3 (mybucket/profilers/jvm-profiler-1.0.0.jar
) 中。在 bootstrapping 我的核心节点和主节点时,我 运行 以下 bootstrap 脚本
sudo mkdir -p /tmp
aws s3 cp s3://mybucket/profilers/jvm-profiler-1.0.0.jar /tmp/
我提交我的emr步骤如下
spark-submit --deploy-mode cluster --master=yarn ......(other parameters).........
--conf spark.jars=/tmp/jvm-profiler-1.0.0.jar --conf spark.driver.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000 --conf spark.executor.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000
但是我无法在日志中看到与分析相关的输出(检查了所有容器的 stdout 和 stderr 日志)。参数是否被忽略?我错过了什么吗?还有什么我可以检查以了解为什么忽略此参数的原因吗?
我没有使用过 Uber JVM Profiler,但我认为要在 spark-submit
中添加额外的 jar,您应该使用 --jars
选项。在 EMR 上工作时,您可以直接从 S3 存储桶中添加它们。
此外,在 bootstrap,您正在将 jar jvm-profiler-1.0.0.jar
复制到 /tmp
文件夹,但是当您设置 Java 选项时,您没有添加路径。试试这个:
spark-submit --deploy-mode cluster \
--master=yarn \
--conf "spark.driver.extraJavaOptions=-javaagent:/tmp/jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000" \
--conf "spark.executor.extraJavaOptions=-javaagent:/tmp/jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000" \
--jars "/tmp/jvm-profiler-1.0.0.jar" \
--<other params>
我正在尝试使用 uber jvm profiler 来分析我的 spark 应用程序(spark 2.4,运行ning on emr 5.21)
以下是我的集群配置
[
{
"classification": "spark-defaults",
"properties": {
"spark.executor.memory": "38300M",
"spark.driver.memory": "38300M",
"spark.yarn.scheduler.reporterThread.maxFailures": "5",
"spark.driver.cores": "5",
"spark.yarn.driver.memoryOverhead": "4255M",
"spark.executor.heartbeatInterval": "60s",
"spark.rdd.compress": "true",
"spark.network.timeout": "800s",
"spark.executor.cores": "5",
"spark.memory.storageFraction": "0.27",
"spark.speculation": "true",
"spark.sql.shuffle.partitions": "200",
"spark.shuffle.spill.compress": "true",
"spark.shuffle.compress": "true",
"spark.storage.level": "MEMORY_AND_DISK_SER",
"spark.default.parallelism": "200",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.memory.fraction": "0.80",
"spark.executor.extraJavaOptions": "-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:OnOutOfMemoryError='kill -9 %p'",
"spark.executor.instances": "107",
"spark.yarn.executor.memoryOverhead": "4255M",
"spark.dynamicAllocation.enabled": "false",
"spark.driver.extraJavaOptions": "-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:OnOutOfMemoryError='kill -9 %p'"
},
"configurations": []
},
{
"classification": "yarn-site",
"properties": {
"yarn.log-aggregation-enable": "true",
"yarn.nodemanager.pmem-check-enabled": "false",
"yarn.nodemanager.vmem-check-enabled": "false"
},
"configurations": []
},
{
"classification": "spark",
"properties": {
"maximizeResourceAllocation": "true",
"spark.sql.broadcastTimeout": "-1"
},
"configurations": []
},
{
"classification": "emrfs-site",
"properties": {
"fs.s3.threadpool.size": "50",
"fs.s3.maxConnections": "5000"
},
"configurations": []
},
{
"classification": "core-site",
"properties": {
"fs.s3.threadpool.size": "50",
"fs.s3.maxConnections": "5000"
},
"configurations": []
}
]
探查器 jar 存储在 s3 (mybucket/profilers/jvm-profiler-1.0.0.jar
) 中。在 bootstrapping 我的核心节点和主节点时,我 运行 以下 bootstrap 脚本
sudo mkdir -p /tmp
aws s3 cp s3://mybucket/profilers/jvm-profiler-1.0.0.jar /tmp/
我提交我的emr步骤如下
spark-submit --deploy-mode cluster --master=yarn ......(other parameters).........
--conf spark.jars=/tmp/jvm-profiler-1.0.0.jar --conf spark.driver.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000 --conf spark.executor.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000
但是我无法在日志中看到与分析相关的输出(检查了所有容器的 stdout 和 stderr 日志)。参数是否被忽略?我错过了什么吗?还有什么我可以检查以了解为什么忽略此参数的原因吗?
我没有使用过 Uber JVM Profiler,但我认为要在 spark-submit
中添加额外的 jar,您应该使用 --jars
选项。在 EMR 上工作时,您可以直接从 S3 存储桶中添加它们。
此外,在 bootstrap,您正在将 jar jvm-profiler-1.0.0.jar
复制到 /tmp
文件夹,但是当您设置 Java 选项时,您没有添加路径。试试这个:
spark-submit --deploy-mode cluster \
--master=yarn \
--conf "spark.driver.extraJavaOptions=-javaagent:/tmp/jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000" \
--conf "spark.executor.extraJavaOptions=-javaagent:/tmp/jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000" \
--jars "/tmp/jvm-profiler-1.0.0.jar" \
--<other params>