在我的 spark-job/code 中传递外部 yml 文件不工作 "Can't construct a java object for tag:yaml.org,2002"

Passing external yml file in my spark-job/code not working throwing "Can't construct a java object for tag:yaml.org,2002"

我正在使用 spark 2.4.1 版本和 java8。我正在尝试加载外部 属性 文件,同时使用 spark-submit 提交我的 spark 作业。

因为我正在使用下面的 TypeSafe 加载我的 属性 文件。

<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.1</version>

在我的 spark 驱动程序中 class MyDriver.java 我正在加载如下 YML 文件

String ymlFilename = args[1].toString();
Optional<QueryEntities>  entities =  InputYamlProcessor.process(ymlFilename);

I have all code here including InputYamlProcessor.java

https://gist.github.com/BdLearnerr/e4c47c5f1dded951b18844b278ea3441

这在我的本地工作正常,但是当我在集群上 运行 时出现错误

Error :

Can't construct a java object for tag:yaml.org,2002:com.snp.yml.QueryEntities; exception=Class not found: com.snp.yml.QueryEntities
 in 'reader', line 1, column 1:
    entities:
    ^

        at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)
        at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
        at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
        at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:444)
        at com.snp.yml.InputYamlProcessor.process(InputYamlProcessor.java:62)
Caused by: org.yaml.snakeyaml.error.YAMLException: Class not found: com.snp.yml.QueryEntities
        at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:650)
        at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:331)
        at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:341)
        ... 12 more

My spark job script is

 $SPARK_HOME/bin/spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --name MyDriver  \
    --jars "/local/jars/*.jar" \
    --files hdfs://files/application-cloud-dev.properties,hdfs://files/column_family_condition.yml \
    --class com.sp.MyDriver \
    --executor-cores 3 \
    --executor-memory 9g \
    --num-executors 5 \
    --driver-cores 2 \
    --driver-memory 4g \
    --driver-java-options -Dconfig.file=./application-cloud-dev.properties \
    --conf spark.executor.extraJavaOptions=-Dconfig.file=./application-cloud-dev.properties \
    --conf spark.driver.extraClassPath=. \
    --driver-class-path . \
     ca-datamigration-0.0.1.jar application-cloud-dev.properties column_family_condition.yml

我在这里做错了什么?如何解决这个问题? 非常感谢任何修复。

已测试:

我在 class 中打印了类似的内容,在上面的行之前...检查问题是否真的 class 未找到。

public static void printTest() {
    QueryEntity e1 = new QueryEntity();
    e1.setTableName("tab1");
    List<QueryEntity> li = new ArrayList<QueryEntity>();
    li.add(e1);


    QueryEntities ll = new QueryEntities();
    ll.setEntitiesList(li);

    ll.getEntitiesList().stream().forEach(e -> logger.error("e1 Name :" + e.getTableName()));


    return;
}

输出:

19/09/18 04:40:33 ERROR yml.InputYamlProcessor: e1 Name :tab1
    Can't construct a java object for tag:yaml.org,2002:com.snp.helpers.QueryEntities; exception=Class not found: com.snp.helpers.QueryEntities
             in 'reader', line 1, column 1:
                entitiesList:
         at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)

这里有什么问题?

这与QueryEntities无关 即 YAMLException:Class 未找到:com.snp.yml.QueryEntities

是 YML 构造函数问题

改为

Yaml yaml = new Yaml(new  CustomClassLoaderConstructor(com.snp.helpers.QueryEntities.class.getClassLoader()));

来自

/*Constructor constructor = new Constructor(com.snp.helpers.QueryEntities.class);
        Yaml yaml = new Yaml( constructor );*/