HIVE_CANNOT_OPEN_SPLIT : 不支持第 <column_name> 列类型 null
HIVE_CANNOT_OPEN_SPLIT : Column <column_name> type null not supported
HIVE_CANNOT_OPEN_SPLIT:打开 Hive 拆分 s3 时出错://path/to/file/<>。snappy.parquet:不支持列 ai.ja 类型 null
这只会在我定义一个 "JA" 列时发生,它是一个字符串结构。如果我将该列排除在外,我可以毫无问题地进行查询。架构信息是使用 Apache Spark 从我们的镶木地板文件中获取的。
The create table statement I'm using to reproduce the error follows:
CREATE EXTERNAL TABLE <<tablename>>(`ai` struct < acs : varchar(100), ltc : varchar(100), primaryapplicant : struct < bwh : varchar(10), citizenship : varchar(20), currentaddresscity : varchar(50), currentaddressstate : varchar(50), currentaddressstreet2 : varchar(50), ss : varchar(50)>, JA : array < struct < dateofbirth : varchar(50), emailaddress : varchar(50), firstname : varchar(50), lastname : varchar(50), ss : varchar(50)>>, status : varchar(50), uri : varchar(50)>, `pr` struct < pc : struct < cn : varchar(50)>>, `product` array < struct < at : varchar(20), pi : varchar(50), pmn : varchar(256)>>, `ipt` varchar(40) ) PARTITIONED BY ( `owner` varchar(40) ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://<location>' TBLPROPERTIES ( 'compression_type' = 'snappy', 'numRows' = '2', 'transient_lastDdlTime' = <> )
正在从 parquet 文件中读取。
Parquet schema :
root
|-- ai: struct (nullable = true)
| |-- acs: string (nullable = true)
| |-- JA: struct (nullable = true)
| | |-- DateOfBirth: string (nullable = true)
| | |-- EmailAddress: string (nullable = true)
| | |-- FirstName: string (nullable = true)
| | |-- LastName: string (nullable = true)
| | |-- ss: string (nullable = true)
| |-- ltc: string (nullable = true)
| |-- PrimaryApplicant: struct (nullable = true)
| | |-- bwh: string (nullable = true)
| | |-- Citizenship: string (nullable = true)
| | |-- CurrentAddressCity: string (nullable = true)
| | |-- CurrentAddressState: string (nullable = true)
| | |-- CurrentAddressStreet2: string (nullable = true)
| | |-- ss: string (nullable = true)
| |-- Status: string (nullable = true)
| |-- uri: string (nullable = true)
|-- pr: struct (nullable = true)
| |-- pc: struct (nullable = true)
| | |-- cn: string (nullable = true)
|-- Product: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- at: string (nullable = true)
| | |-- pi: string (nullable = true)
| | |-- pmn: string (nullable = true)
|-- ipt: string (nullable = true)
此 link https://forums.aws.amazon.com/thread.jspa?threadID=246551 上提出了同样的问题。
但仍然无法弄清楚。
有人能帮忙吗?
此问题已解决。
要创建 Athena table,每个字段都应准确映射到架构,即每个字段的顺序应与架构的顺序相同。
HIVE_CANNOT_OPEN_SPLIT:打开 Hive 拆分 s3 时出错://path/to/file/<>。snappy.parquet:不支持列 ai.ja 类型 null
这只会在我定义一个 "JA" 列时发生,它是一个字符串结构。如果我将该列排除在外,我可以毫无问题地进行查询。架构信息是使用 Apache Spark 从我们的镶木地板文件中获取的。
The create table statement I'm using to reproduce the error follows:
CREATE EXTERNAL TABLE <<tablename>>(`ai` struct < acs : varchar(100), ltc : varchar(100), primaryapplicant : struct < bwh : varchar(10), citizenship : varchar(20), currentaddresscity : varchar(50), currentaddressstate : varchar(50), currentaddressstreet2 : varchar(50), ss : varchar(50)>, JA : array < struct < dateofbirth : varchar(50), emailaddress : varchar(50), firstname : varchar(50), lastname : varchar(50), ss : varchar(50)>>, status : varchar(50), uri : varchar(50)>, `pr` struct < pc : struct < cn : varchar(50)>>, `product` array < struct < at : varchar(20), pi : varchar(50), pmn : varchar(256)>>, `ipt` varchar(40) ) PARTITIONED BY ( `owner` varchar(40) ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://<location>' TBLPROPERTIES ( 'compression_type' = 'snappy', 'numRows' = '2', 'transient_lastDdlTime' = <> )
正在从 parquet 文件中读取。
Parquet schema :
root
|-- ai: struct (nullable = true)
| |-- acs: string (nullable = true)
| |-- JA: struct (nullable = true)
| | |-- DateOfBirth: string (nullable = true)
| | |-- EmailAddress: string (nullable = true)
| | |-- FirstName: string (nullable = true)
| | |-- LastName: string (nullable = true)
| | |-- ss: string (nullable = true)
| |-- ltc: string (nullable = true)
| |-- PrimaryApplicant: struct (nullable = true)
| | |-- bwh: string (nullable = true)
| | |-- Citizenship: string (nullable = true)
| | |-- CurrentAddressCity: string (nullable = true)
| | |-- CurrentAddressState: string (nullable = true)
| | |-- CurrentAddressStreet2: string (nullable = true)
| | |-- ss: string (nullable = true)
| |-- Status: string (nullable = true)
| |-- uri: string (nullable = true)
|-- pr: struct (nullable = true)
| |-- pc: struct (nullable = true)
| | |-- cn: string (nullable = true)
|-- Product: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- at: string (nullable = true)
| | |-- pi: string (nullable = true)
| | |-- pmn: string (nullable = true)
|-- ipt: string (nullable = true)
此 link https://forums.aws.amazon.com/thread.jspa?threadID=246551 上提出了同样的问题。 但仍然无法弄清楚。
有人能帮忙吗?
此问题已解决。
要创建 Athena table,每个字段都应准确映射到架构,即每个字段的顺序应与架构的顺序相同。