在 Scala 中解析内容为 Json 格式的文件
Parsing a file having content as Json format in Scala
我想解析一个内容为 json 格式的文件。
我想从文件中提取一些属性(名称、数据类型、可空)以动态创建一些列名称。
我已经看过一些示例,但大多数示例都使用大小写 class 但我的问题是每次我收到的文件都可能有不同的内容。
我尝试使用 ujson 库来解析文件,但我无法理解如何正确使用它。
object JsonTest {
def main(args: Array[String]): Unit = {
val source = scala.io.Source.fromFile("C:\Users\ktngme\Desktop\ass\file.txt")
println(source)
val input = try source.mkString finally source.close()
println(input)
val data = ujson.read(input)
data("name") = data("name").str.reverse
val updated = data.render()
}
}
文件内容示例:
{
"Organization": {
"project": {
"name": "POC 4PL",
"description": "Implementation of orderbook"
},
"Entities": [
{
"name": "Shipments",
"Type": "Fact",
"Attributes": [
{
"name": "Shipment_Details",
"DataType": "StringType",
"Nullable": "true"
},
{
"name": "Shipment_ID",
"DataType": "StringType",
"Nullable": "true"
},
{
"name": "View_Cost",
"DataType": "StringType",
"Nullable": "true"
}
],
"ADLS_Location": "/mnt/mns/adls/raw/poc/orderbook/"
}
]
}
}
预期输出:
StructType(
Array(StructField("Shipment_Details",StringType,true),
StructField("Shipment_ID",DateType,true),
StructField("View_Cost",DateType,true)))
需要以编程方式将 StructType 添加到预期输出中。
这取决于您是否希望它完全动态,这里有一些选项:
如果您只想阅读一个字段,您可以这样做:
import upickle.default._
val source = scala.io.Source.fromFile("C:\Users\ktngme\Desktop\ass\file.txt")
val input = try source.mkString finally source.close()
val json = ujson.read(input)
println(json("Organization")("project")("name"))
输出将是:"POC 4PL"
如果您只想让属性与类型一起使用,您可以这样做:
import upickle.default.{macroRW, ReadWriter => RW}
import upickle.default._
val source = scala.io.Source.fromFile("C:\Users\ktngme\Desktop\ass\file.txt")
val input = try source.mkString finally source.close()
val json = ujson.read(input)
val entitiesArray = json("Organization")("Entities")(0)("Attributes")
println(read[Seq[StructField]](entitiesArray))
case class StructField(name: String, DataType: String, Nullable: String)
object StructField{
implicit val rw: RW[StructField] = macroRW
}
输出将是:List(StructField(Shipment_Details,StringType,true), StructField(Shipment_ID,StringType,true), StructField(View_Cost,StringType,true))
另一种选择是使用不同的库来进行 class 映射。如果你使用 Google Protobuf Struct and JsonFormat 它可以是 2-liner:
import com.google.protobuf.Struct
import com.google.protobuf.util.JsonFormat
val source = scala.io.Source.fromFile("C:\Users\ktngme\Desktop\ass\file.txt")
val input = try source.mkString finally source.close()
JsonFormat.parser().merge(input, builder)
println(builder.build())
输出将是:fields { key: "Organization" value { struct_value { fields { key: "project" value { struct_value { fields { key: "name" value { string_value: "POC 4PL" } } fields { key: "description" value { string_value: "Implementation of orderbook" } } } } } fields { key: "Entities" value { list_value { values { struct_value { fields { key: "name" value { string_value: "Shipments" } }...
尝试使用 Playframework 的 Json 工具 - https://www.playframework.com/documentation/2.7.x/ScalaJson
这是您问题的解决方案-
\ 将您的 json 放入文本文件
val fil_path = "C:\TestData\Config\Conf.txt"
val conf_source = scala.io.Source.fromFile(fil_path)
lazy val json_str = try conf_source.mkString finally conf_source.close()
val conf_json: JsValue = Json.parse(json_str)
val all_entities: JsArray = (conf_json \ "Organization" \ "Entities").get.asInstanceOf[JsArray]
val shipments: JsValue = all_entities.value.filter(e => e.\("name").as[String] == "Shipments").head
val shipments_attributes: IndexedSeq[JsValue] = shipments.\("Attributes").get.asInstanceOf[JsArray].value
val shipments_schema: StructType = StructType(shipments_attributes.map(a => Tuple3(a.\("name").as[String], a.\("DataType").as[String], a.\("Nullable").as[String]))
.map(x => StructField(x._1, StrtoDatatype(x._2), x._3.toBoolean)))
shipments_schema.fields.foreach(println)
输出为-
StructField(Shipment_Details,StringType,true)
StructField(Shipment_ID,StringType,true)
StructField(View_Cost,StringType,true)
我想解析一个内容为 json 格式的文件。 我想从文件中提取一些属性(名称、数据类型、可空)以动态创建一些列名称。 我已经看过一些示例,但大多数示例都使用大小写 class 但我的问题是每次我收到的文件都可能有不同的内容。
我尝试使用 ujson 库来解析文件,但我无法理解如何正确使用它。
object JsonTest {
def main(args: Array[String]): Unit = {
val source = scala.io.Source.fromFile("C:\Users\ktngme\Desktop\ass\file.txt")
println(source)
val input = try source.mkString finally source.close()
println(input)
val data = ujson.read(input)
data("name") = data("name").str.reverse
val updated = data.render()
}
}
文件内容示例:
{
"Organization": {
"project": {
"name": "POC 4PL",
"description": "Implementation of orderbook"
},
"Entities": [
{
"name": "Shipments",
"Type": "Fact",
"Attributes": [
{
"name": "Shipment_Details",
"DataType": "StringType",
"Nullable": "true"
},
{
"name": "Shipment_ID",
"DataType": "StringType",
"Nullable": "true"
},
{
"name": "View_Cost",
"DataType": "StringType",
"Nullable": "true"
}
],
"ADLS_Location": "/mnt/mns/adls/raw/poc/orderbook/"
}
]
}
}
预期输出:
StructType(
Array(StructField("Shipment_Details",StringType,true),
StructField("Shipment_ID",DateType,true),
StructField("View_Cost",DateType,true)))
需要以编程方式将 StructType 添加到预期输出中。
这取决于您是否希望它完全动态,这里有一些选项:
如果您只想阅读一个字段,您可以这样做:
import upickle.default._
val source = scala.io.Source.fromFile("C:\Users\ktngme\Desktop\ass\file.txt")
val input = try source.mkString finally source.close()
val json = ujson.read(input)
println(json("Organization")("project")("name"))
输出将是:"POC 4PL"
如果您只想让属性与类型一起使用,您可以这样做:
import upickle.default.{macroRW, ReadWriter => RW}
import upickle.default._
val source = scala.io.Source.fromFile("C:\Users\ktngme\Desktop\ass\file.txt")
val input = try source.mkString finally source.close()
val json = ujson.read(input)
val entitiesArray = json("Organization")("Entities")(0)("Attributes")
println(read[Seq[StructField]](entitiesArray))
case class StructField(name: String, DataType: String, Nullable: String)
object StructField{
implicit val rw: RW[StructField] = macroRW
}
输出将是:List(StructField(Shipment_Details,StringType,true), StructField(Shipment_ID,StringType,true), StructField(View_Cost,StringType,true))
另一种选择是使用不同的库来进行 class 映射。如果你使用 Google Protobuf Struct and JsonFormat 它可以是 2-liner:
import com.google.protobuf.Struct
import com.google.protobuf.util.JsonFormat
val source = scala.io.Source.fromFile("C:\Users\ktngme\Desktop\ass\file.txt")
val input = try source.mkString finally source.close()
JsonFormat.parser().merge(input, builder)
println(builder.build())
输出将是:fields { key: "Organization" value { struct_value { fields { key: "project" value { struct_value { fields { key: "name" value { string_value: "POC 4PL" } } fields { key: "description" value { string_value: "Implementation of orderbook" } } } } } fields { key: "Entities" value { list_value { values { struct_value { fields { key: "name" value { string_value: "Shipments" } }...
尝试使用 Playframework 的 Json 工具 - https://www.playframework.com/documentation/2.7.x/ScalaJson
这是您问题的解决方案- \ 将您的 json 放入文本文件
val fil_path = "C:\TestData\Config\Conf.txt"
val conf_source = scala.io.Source.fromFile(fil_path)
lazy val json_str = try conf_source.mkString finally conf_source.close()
val conf_json: JsValue = Json.parse(json_str)
val all_entities: JsArray = (conf_json \ "Organization" \ "Entities").get.asInstanceOf[JsArray]
val shipments: JsValue = all_entities.value.filter(e => e.\("name").as[String] == "Shipments").head
val shipments_attributes: IndexedSeq[JsValue] = shipments.\("Attributes").get.asInstanceOf[JsArray].value
val shipments_schema: StructType = StructType(shipments_attributes.map(a => Tuple3(a.\("name").as[String], a.\("DataType").as[String], a.\("Nullable").as[String]))
.map(x => StructField(x._1, StrtoDatatype(x._2), x._3.toBoolean)))
shipments_schema.fields.foreach(println)
输出为-
StructField(Shipment_Details,StringType,true)
StructField(Shipment_ID,StringType,true)
StructField(View_Cost,StringType,true)