Spark SQL JSON 布尔值计算
Spark SQL JSON Boolean Evaluation
我有示例 JSON 架构(由于大小而被截断):
|-- LinearScheduleResult: struct (nullable = true)
| |-- Build: string (nullable = true)
| |-- EndTimestamp: string (nullable = true)
| |-- Errors: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- RequestId: string (nullable = true)
| |-- Schedule: struct (nullable = true)
| | |-- Airings: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- AiringTime: string (nullable = true)
| | | | |-- AiringType: string (nullable = true)
| | | | |-- CC: boolean (nullable = true)
| | | | |-- CallLetters: string (nullable = true)
| | | | |-- Category: string (nullable = true)
| | | | |-- Channel: string (nullable = true)
| | | | |-- Color: string (nullable = true)
| | | | |-- Copy: string (nullable = true)
| | | | |-- DSS: boolean (nullable = true)
| | | | |-- DVS: boolean (nullable = true)
| | | | |-- Dolby: boolean (nullable = true)
| | | | |-- Duration: long (nullable = true)
| | | | |-- DvbTriplet: string (nullable = true)
| | | | |-- EpisodeTitle: string (nullable = true)
| | | | |-- HD: boolean (nullable = true)
| | | | |-- HDLevel: string (nullable = true)
| | | | |-- IconAvailable: boolean (nullable = true)
| | | | |-- InstanceId: string (nullable = true)
| | | | |-- LetterBox: boolean (nullable = true)
| | | | |-- MovieRating: string (nullable = true)
| | | | |-- ParentNetworkId: long (nullable = true)
| | | | |-- ProgramId: string (nullable = true)
| | | | |-- SAP: boolean (nullable = true)
| | | | |-- SL: string (nullable = true)
| | | | |-- SeriesId: string (nullable = true)
| | | | |-- ServiceId: long (nullable = true)
| | | | |-- ShowingType: string (nullable = true)
| | | | |-- SourceDisplayName: string (nullable = true)
| | | | |-- SourceId: long (nullable = true)
| | | | |-- SourceLongName: string (nullable = true)
| | | | |-- Sports: boolean (nullable = true)
当我执行以下操作时:
results = sqlContext.sql("SELECT LinearScheduleResult.Schedule.Airings.Sports from tv")
它returns:
[Row(Sports=[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False])]
当我做一些更复杂的事情时:
results = sqlContext.sql("SELECT LinearScheduleResult.Schedule.Airings from tv where LinearScheduleResult.Schedule.Airings.Sports = 'False'")
它永远不会 return 任何东西,我试过 'false'、false、0、FALSE 以及更多组合。
如有任何帮助,我们将不胜感激。
Airings是一个数组,需要先把行炸开。类似于:
select a from tv
lateral view explode(LinearScheduleResult.Schedule.Airings) a as a
where a.Sports = false
你必须为此使用 HiveSqlContext。
见https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
我有示例 JSON 架构(由于大小而被截断):
|-- LinearScheduleResult: struct (nullable = true)
| |-- Build: string (nullable = true)
| |-- EndTimestamp: string (nullable = true)
| |-- Errors: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- RequestId: string (nullable = true)
| |-- Schedule: struct (nullable = true)
| | |-- Airings: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- AiringTime: string (nullable = true)
| | | | |-- AiringType: string (nullable = true)
| | | | |-- CC: boolean (nullable = true)
| | | | |-- CallLetters: string (nullable = true)
| | | | |-- Category: string (nullable = true)
| | | | |-- Channel: string (nullable = true)
| | | | |-- Color: string (nullable = true)
| | | | |-- Copy: string (nullable = true)
| | | | |-- DSS: boolean (nullable = true)
| | | | |-- DVS: boolean (nullable = true)
| | | | |-- Dolby: boolean (nullable = true)
| | | | |-- Duration: long (nullable = true)
| | | | |-- DvbTriplet: string (nullable = true)
| | | | |-- EpisodeTitle: string (nullable = true)
| | | | |-- HD: boolean (nullable = true)
| | | | |-- HDLevel: string (nullable = true)
| | | | |-- IconAvailable: boolean (nullable = true)
| | | | |-- InstanceId: string (nullable = true)
| | | | |-- LetterBox: boolean (nullable = true)
| | | | |-- MovieRating: string (nullable = true)
| | | | |-- ParentNetworkId: long (nullable = true)
| | | | |-- ProgramId: string (nullable = true)
| | | | |-- SAP: boolean (nullable = true)
| | | | |-- SL: string (nullable = true)
| | | | |-- SeriesId: string (nullable = true)
| | | | |-- ServiceId: long (nullable = true)
| | | | |-- ShowingType: string (nullable = true)
| | | | |-- SourceDisplayName: string (nullable = true)
| | | | |-- SourceId: long (nullable = true)
| | | | |-- SourceLongName: string (nullable = true)
| | | | |-- Sports: boolean (nullable = true)
当我执行以下操作时:
results = sqlContext.sql("SELECT LinearScheduleResult.Schedule.Airings.Sports from tv")
它returns:
[Row(Sports=[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False])]
当我做一些更复杂的事情时:
results = sqlContext.sql("SELECT LinearScheduleResult.Schedule.Airings from tv where LinearScheduleResult.Schedule.Airings.Sports = 'False'")
它永远不会 return 任何东西,我试过 'false'、false、0、FALSE 以及更多组合。
如有任何帮助,我们将不胜感激。
Airings是一个数组,需要先把行炸开。类似于:
select a from tv
lateral view explode(LinearScheduleResult.Schedule.Airings) a as a
where a.Sports = false
你必须为此使用 HiveSqlContext。
见https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView