如何使用 java 中的 apache spark 从 json 文件获取嵌套列

how to get nested column from json file using apache spark in java

我有多个 json 文件。我必须使用 apache spark 解析它。它有嵌套的键初始化。我必须打印所有列以及嵌套键。

这些文件也有嵌套键。 我想获取所有列名以及嵌套列名。我怎样才能得到它。

我试过这个:

String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-01.json,/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-02.json";

String[] jsonFiles = jsonFilePath.split(",");

Dataset<Row> people = sparkSession.read().json(jsonFiles);

json 文件中的结构为:

{ 
   "Name":"Vipin Suman",
   "Email":"vpn2330@gmail.com",
   "Designation":"Programmer",
   "Age":22 ,
   "location":
             {
             "City":"Ahmedabad",
             "State":"Gujarat"
             }
}

我得到的结果是:

people.show(50, false);

Age | Designation | Email            | Name       | Location
------------------------------------------------------------
22  |Programmer   |vpn2330@gmail.com | Vipin Suman|[Ahmedabad,Gujarat]

我想要这样的数据:

Age | Designation | Email            | Name       | City      | State
------------------------------------------------------------
22  |Programmer   |vpn2330@gmail.com | Vipin Suman| Ahmedabad |Gujarat

或喜欢:-

Age | Designation | Email            | Name       | Location
---------------------------------------------------------------
22  |Programmer   |vpn2330@gmail.com | Vipin Suman| Ahmedabad,Gujarat

如果场景看起来像这样

root
 |-- Age: long (nullable = true)
 |-- Company: struct (nullable = true)
 |    |-- Company Name: string (nullable = true)
 |    |-- Domain: string (nullable = true)
 |-- Designation: string (nullable = true)
 |-- Email: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- Test: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- location: struct (nullable = true)
 |    |-- City: struct (nullable = true)
 |    |    |-- City Name: string (nullable = true)
 |    |    |-- Pin: long (nullable = true)
 |    |-- State: string (nullable = true)  

和json结构

{ 
  "Name":"Vipin Suman",
  "Email":"vpn2330@gmail.com",
 "Designation":"Trainee Programmer",
 "Age":22 ,
 "location":
    {"City":
           {
            "Pin":324009,
            "City Name":"Ahmedabad"
           },
    "State":"Gujarat"
   },
 "Company":
          {
           "Company Name":"Elegant",
           "Domain":"Java"
          }, 
 "Test":["Test1","Test2"]

}

然后我如何找到嵌套键。并以正确的格式显示 table

要以高于预期的格式显示数据,您可以使用以下代码:

people.select("*", "location.*").drop("location").show

它将给出以下输出:

+---+-----------+-----------------+----------+---------+-------+
|Age|Designation|            Email|      Name|     City|  State|
+---+-----------+-----------------+----------+---------+-------+
| 22| Programmer|vpn2330@gmail.com|VipinSuman|Ahmedabad|Gujarat|
+---+-----------+-----------------+----------+---------+-------+