如何使用 java 中的 apache spark 从 json 文件获取嵌套列
how to get nested column from json file using apache spark in java
我有多个 json 文件。我必须使用 apache spark 解析它。它有嵌套的键初始化。我必须打印所有列以及嵌套键。
这些文件也有嵌套键。
我想获取所有列名以及嵌套列名。我怎样才能得到它。
我试过这个:
String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-01.json,/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-02.json";
String[] jsonFiles = jsonFilePath.split(",");
Dataset<Row> people = sparkSession.read().json(jsonFiles);
json 文件中的结构为:
{
"Name":"Vipin Suman",
"Email":"vpn2330@gmail.com",
"Designation":"Programmer",
"Age":22 ,
"location":
{
"City":"Ahmedabad",
"State":"Gujarat"
}
}
我得到的结果是:
people.show(50, false);
Age | Designation | Email | Name | Location
------------------------------------------------------------
22 |Programmer |vpn2330@gmail.com | Vipin Suman|[Ahmedabad,Gujarat]
我想要这样的数据:
Age | Designation | Email | Name | City | State
------------------------------------------------------------
22 |Programmer |vpn2330@gmail.com | Vipin Suman| Ahmedabad |Gujarat
或喜欢:-
Age | Designation | Email | Name | Location
---------------------------------------------------------------
22 |Programmer |vpn2330@gmail.com | Vipin Suman| Ahmedabad,Gujarat
如果场景看起来像这样
root
|-- Age: long (nullable = true)
|-- Company: struct (nullable = true)
| |-- Company Name: string (nullable = true)
| |-- Domain: string (nullable = true)
|-- Designation: string (nullable = true)
|-- Email: string (nullable = true)
|-- Name: string (nullable = true)
|-- Test: array (nullable = true)
| |-- element: string (containsNull = true)
|-- location: struct (nullable = true)
| |-- City: struct (nullable = true)
| | |-- City Name: string (nullable = true)
| | |-- Pin: long (nullable = true)
| |-- State: string (nullable = true)
和json结构
{
"Name":"Vipin Suman",
"Email":"vpn2330@gmail.com",
"Designation":"Trainee Programmer",
"Age":22 ,
"location":
{"City":
{
"Pin":324009,
"City Name":"Ahmedabad"
},
"State":"Gujarat"
},
"Company":
{
"Company Name":"Elegant",
"Domain":"Java"
},
"Test":["Test1","Test2"]
}
然后我如何找到嵌套键。并以正确的格式显示 table
要以高于预期的格式显示数据,您可以使用以下代码:
people.select("*", "location.*").drop("location").show
它将给出以下输出:
+---+-----------+-----------------+----------+---------+-------+
|Age|Designation| Email| Name| City| State|
+---+-----------+-----------------+----------+---------+-------+
| 22| Programmer|vpn2330@gmail.com|VipinSuman|Ahmedabad|Gujarat|
+---+-----------+-----------------+----------+---------+-------+
我有多个 json 文件。我必须使用 apache spark 解析它。它有嵌套的键初始化。我必须打印所有列以及嵌套键。
这些文件也有嵌套键。 我想获取所有列名以及嵌套列名。我怎样才能得到它。
我试过这个:
String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-01.json,/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-02.json";
String[] jsonFiles = jsonFilePath.split(",");
Dataset<Row> people = sparkSession.read().json(jsonFiles);
json 文件中的结构为:
{
"Name":"Vipin Suman",
"Email":"vpn2330@gmail.com",
"Designation":"Programmer",
"Age":22 ,
"location":
{
"City":"Ahmedabad",
"State":"Gujarat"
}
}
我得到的结果是:
people.show(50, false);
Age | Designation | Email | Name | Location
------------------------------------------------------------
22 |Programmer |vpn2330@gmail.com | Vipin Suman|[Ahmedabad,Gujarat]
我想要这样的数据:
Age | Designation | Email | Name | City | State
------------------------------------------------------------
22 |Programmer |vpn2330@gmail.com | Vipin Suman| Ahmedabad |Gujarat
或喜欢:-
Age | Designation | Email | Name | Location
---------------------------------------------------------------
22 |Programmer |vpn2330@gmail.com | Vipin Suman| Ahmedabad,Gujarat
如果场景看起来像这样
root
|-- Age: long (nullable = true)
|-- Company: struct (nullable = true)
| |-- Company Name: string (nullable = true)
| |-- Domain: string (nullable = true)
|-- Designation: string (nullable = true)
|-- Email: string (nullable = true)
|-- Name: string (nullable = true)
|-- Test: array (nullable = true)
| |-- element: string (containsNull = true)
|-- location: struct (nullable = true)
| |-- City: struct (nullable = true)
| | |-- City Name: string (nullable = true)
| | |-- Pin: long (nullable = true)
| |-- State: string (nullable = true)
和json结构
{
"Name":"Vipin Suman",
"Email":"vpn2330@gmail.com",
"Designation":"Trainee Programmer",
"Age":22 ,
"location":
{"City":
{
"Pin":324009,
"City Name":"Ahmedabad"
},
"State":"Gujarat"
},
"Company":
{
"Company Name":"Elegant",
"Domain":"Java"
},
"Test":["Test1","Test2"]
}
然后我如何找到嵌套键。并以正确的格式显示 table
要以高于预期的格式显示数据,您可以使用以下代码:
people.select("*", "location.*").drop("location").show
它将给出以下输出:
+---+-----------+-----------------+----------+---------+-------+
|Age|Designation| Email| Name| City| State|
+---+-----------+-----------------+----------+---------+-------+
| 22| Programmer|vpn2330@gmail.com|VipinSuman|Ahmedabad|Gujarat|
+---+-----------+-----------------+----------+---------+-------+