如何为 json 定义一个架构以在 from_json 中使用以解析值

How to define a schema for json to be used in from_json to parse out values

我正在尝试提出一个模式定义来解析我正在使用 from_json 的数据框字符串列中的信息。我在定义架构方面需要帮助,但不知何故我没有把它弄好。

这是我的 Json

[
 {
  "sectionid":"838096e332d4419191877a3fd40ed1f4",
  "sequence":0,
  "questions":[
     {
        "xid":"urn:com.mheducation.openlearning:lms.assessment.author:qastg.global:assessment_item:2a0f52fb93954f4590ac88d90888be7b",
        "questionid":"d36e1d7eeeae459c8db75c7d2dfd6ac6",
        "quizquestionid":"d36e1d7eeeae459c8db75c7d2dfd6ac6",
        "qtype":"3",
        "sequence":0,
        "subsectionsequence":-1,
        "type":"80",
        "question":"<p>This is a simple, 1 question assessment for automation testing</p>",
        "totalpoints":"5.0",
        "scoring":"1",
        "scoringrules":"{\"type\":\"perfect\",\"points\":5.0,\"pointsEach\":null,\"rules\":[]}",
        "inputoption":"0",
        "casesensitive":"0",
        "suggestedscoring":"1",
        "suggestedscoringrules":"{\"type\":\"perfect\",\"points\":5.0,\"pointsEach\":null,\"rules\":[]}",
        "answers":[
           "1"
        ],
        "options":[
           
        ]
     }
  ]
 }
]

我想解析此信息,这将导致列 sectionid , sequence, xid, question.sequence, question.question(问题文本), answers

这是我所拥有的我已经定义了一个测试模式

    import org.apache.spark.sql.types.{StringType, ArrayType, StructType, 
     StructField}
 val schema = new StructType()
.add("sectionid", StringType, true)
.add("sequence", StringType, true)
.add("questions", StringType, true)
.add("answers", StringType, true)

  finalDF = finalDF
  .withColumn( "parsed", from_json(col("enriched_payload.transformed"),schema) ) 

但是我在结果列中得到 NULL,我认为是因为我的架构不正确。 我正在努力想出正确的定义。我如何提出正确的 json 架构定义?

我正在使用 spark 3.0

试试下面的代码。

import org.apache.spark.sql.types._

val schema = ArrayType(
    new StructType()
    .add("sectionid",StringType,true)
    .add("sequence",LongType,true)
    .add("questions", ArrayType(      
                        new StructType()
                            .add("answers",ArrayType(StringType,true),true)
                            .add("casesensitive",StringType,true)
                            .add("inputoption",StringType,true)
                            .add("options",ArrayType(StringType,true),true)
                            .add("qtype",StringType,true)
                            .add("question",StringType,true)
                            .add("questionid",StringType,true)
                            .add("quizquestionid",StringType,true)
                            .add("scoring",StringType,true)
                            .add("scoringrules",StringType,true)
                            .add("sequence",LongType,true)
                            .add("subsectionsequence",LongType,true)
                            .add("suggestedscoring",StringType,true)
                            .add("suggestedscoringrules",StringType,true)
                            .add("totalpoints",StringType,true)
                            .add("type",StringType,true)
                            .add("xid",StringType,true)
                        )
    )
)