解析 conf core-default.xml 而 运行 带有 Spark 的 geotool shadow jar 时出错
Error parsing conf core-default.xml While running shadow jar of geotool with Spark
我创建了一个 spark 应用程序来处理 lat/long 并识别客户提供的自定义形状文件中定义的区域。
鉴于此要求,我使用 Maven 创建了一个影子 jar 文件。
但是当我 运行 通过 spark-submit 申请时它会抛出以下错误
WARNING: User-defined SPARK_HOME
(/opt/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/spark)
overrides detected
(/app/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/spark).
WARNING: Running spark-class from user-defined location. 18/10/19
17:41:58 INFO SparkContext: Running Spark version 1.6.0 18/10/19
17:41:59 ERROR Configuration: error parsing conf core-default.xml
javax.xml.parsers.ParserConfigurationException: Feature
'http://apache.org/xml/features/xinclude' is not recognized.
at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown
Source)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2694)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2653)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2559)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1078)
at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1132)
at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1540)
at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:85)
at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:74)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:891)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:857)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:724)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils.scala:2214)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils.scala:2214)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2214)
at org.apache.spark.SparkContext.(SparkContext.scala:324)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at com.abc.xyz.ShapeFileDataProcessor.main(ShapeFileDataProcessor.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:891)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:857)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:724)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils.scala:2214)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils.scala:2214)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2214)
at org.apache.spark.SparkContext.(SparkContext.scala:324)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at com.abc.xyz.ShapeFileDataProcessor.main(ShapeFileDataProcessor.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.RuntimeException:
javax.xml.parsers.ParserConfigurationException: Feature
'http://apache.org/xml/features/xinclude' is not recognized.
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2820)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2653)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2559)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1078)
at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1132)
at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1540)
at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:85)
at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:74)
... 21 more Caused by: javax.xml.parsers.ParserConfigurationException: Feature
'http://apache.org/xml/features/xinclude' is not recognized.
at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown
Source)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2694)
... 28 more
这是 spark-submit 命令
spark-submit --name ShapeFileProcessor --master yarn-client --files application.properties --conf "spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/abc-spark-stream/ " --conf "spark.eventLog.enabled=true" --conf "spark.executor.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/abc-spark-stream/ " --class com.abc.xyz.ShapeFileDataProcessor CustomShapeFileAggregator-0.0.1.jar
这里是来自 gradle 的存储库和依赖项的代码片段
repositories {
mavenLocal()
maven { url 'http://maven.geo-solutions.it' }
maven { url 'http://download.java.net/maven/2' }
maven { url 'http://download.osgeo.org/webdav/geotools/' }
}
task shadowJar(type: Jar) {
manifest {
attributes 'Implementation-Title': 'My Application',
'Implementation-Version': version
}
baseName = project.name
from {
configurations.compile.collect {
it.isDirectory() ? it : zipTree(it)
}
}
with jar
}
dependencies {
compile group: 'org.geotools', name: 'gt-shapefile', version: '14.5'
compile group: 'org.geotools', name: 'gt-swing', version: '14.5'
provided group: 'org.apache.spark', name: 'spark-core_2.10', version: '1.6.0'
provided group: 'org.apache.spark', name: 'spark-sql_2.10', version: '1.6.0'
provided group: 'org.apache.spark', name: 'spark-hive_2.10', version: '1.6.0'
}
对我来说这是一个依赖性问题。有从其他一些依赖项导入的“xerces”罐子。从我的 pom.xml 中排除“xerces”的这些依赖项,解决了问题。
<exclusions>
<exclusion>
<artifactId>xercesImpl</artifactId>
<groupId>xerces</groupId>
</exclusion>
<exclusion>
<artifactId>xmlParserAPIs</artifactId>
<groupId>xerces</groupId>
</exclusion>
</exclusions>
我创建了一个 spark 应用程序来处理 lat/long 并识别客户提供的自定义形状文件中定义的区域。 鉴于此要求,我使用 Maven 创建了一个影子 jar 文件。 但是当我 运行 通过 spark-submit 申请时它会抛出以下错误
WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/spark) overrides detected (/app/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/spark). WARNING: Running spark-class from user-defined location. 18/10/19 17:41:58 INFO SparkContext: Running Spark version 1.6.0 18/10/19 17:41:59 ERROR Configuration: error parsing conf core-default.xml javax.xml.parsers.ParserConfigurationException: Feature 'http://apache.org/xml/features/xinclude' is not recognized. at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2694) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2653) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2559) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1078) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1132) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1540) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:85) at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:74) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:891) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:857) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:724) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils.scala:2214) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils.scala:2214) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2214) at org.apache.spark.SparkContext.(SparkContext.scala:324) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59) at com.abc.xyz.ShapeFileDataProcessor.main(ShapeFileDataProcessor.java:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730) at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:891) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:857) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:724) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils.scala:2214) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils.scala:2214) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2214) at org.apache.spark.SparkContext.(SparkContext.scala:324) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59) at com.abc.xyz.ShapeFileDataProcessor.main(ShapeFileDataProcessor.java:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730) at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.RuntimeException: javax.xml.parsers.ParserConfigurationException: Feature 'http://apache.org/xml/features/xinclude' is not recognized. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2820) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2653) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2559) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1078) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1132) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1540) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:85) at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:74) ... 21 more Caused by: javax.xml.parsers.ParserConfigurationException: Feature 'http://apache.org/xml/features/xinclude' is not recognized. at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2694) ... 28 more
这是 spark-submit 命令
spark-submit --name ShapeFileProcessor --master yarn-client --files application.properties --conf "spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/abc-spark-stream/ " --conf "spark.eventLog.enabled=true" --conf "spark.executor.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/abc-spark-stream/ " --class com.abc.xyz.ShapeFileDataProcessor CustomShapeFileAggregator-0.0.1.jar
这里是来自 gradle 的存储库和依赖项的代码片段
repositories {
mavenLocal()
maven { url 'http://maven.geo-solutions.it' }
maven { url 'http://download.java.net/maven/2' }
maven { url 'http://download.osgeo.org/webdav/geotools/' }
}
task shadowJar(type: Jar) {
manifest {
attributes 'Implementation-Title': 'My Application',
'Implementation-Version': version
}
baseName = project.name
from {
configurations.compile.collect {
it.isDirectory() ? it : zipTree(it)
}
}
with jar
}
dependencies {
compile group: 'org.geotools', name: 'gt-shapefile', version: '14.5'
compile group: 'org.geotools', name: 'gt-swing', version: '14.5'
provided group: 'org.apache.spark', name: 'spark-core_2.10', version: '1.6.0'
provided group: 'org.apache.spark', name: 'spark-sql_2.10', version: '1.6.0'
provided group: 'org.apache.spark', name: 'spark-hive_2.10', version: '1.6.0'
}
对我来说这是一个依赖性问题。有从其他一些依赖项导入的“xerces”罐子。从我的 pom.xml 中排除“xerces”的这些依赖项,解决了问题。
<exclusions>
<exclusion>
<artifactId>xercesImpl</artifactId>
<groupId>xerces</groupId>
</exclusion>
<exclusion>
<artifactId>xmlParserAPIs</artifactId>
<groupId>xerces</groupId>
</exclusion>
</exclusions>