XML Hive Serde 提取时间戳 Hadoop
XML Hive Serde Extract timestamp Hadoop
我正在尝试使用 Hive 中的 xml serde 从 xml 中提取时间戳。外部 table 创建链接到 hdfs 目录。目前,时间戳值在我的 table.
中显示为 null
我在想时间戳需要投射吗?我不知道。 xml 信息的其余部分工作正常并显示在配置单元中。
输入文件是:
<example>
<date>2017-02-09 22:03:58<date>
</example>
Hive 创建脚本:
create external table example (
date timestamp
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.date"="/example/date/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'mypath'
TBLPROPERTIES (
"xmlinput.start"="<example>",
"xmlinput.end"="</example>"
);
似乎只支持 Java 原始类型。
查看 XmlUtils.java
文件中的 getPrimitiveValue
方法。
/**
* (c) Copyright IBM Corp. 2013. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.ibm.spss.hive.serde2.xml.processor;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
/**
* The XML utilities
*/
public class XmlUtils {
/**
* Private constructor
*/
private XmlUtils() {
}
/**
* Converts the string value to the java object for the given primitive category
*
* @param value
* the value
* @param primitiveCategory
* the primitive category
* @return the java object
*/
public static Object getPrimitiveValue(String value, PrimitiveCategory primitiveCategory) {
if (value != null) {
try {
switch (primitiveCategory) {
case BOOLEAN:
return Boolean.valueOf(value);
case BYTE:
return Byte.valueOf(value);
case DOUBLE:
return Double.valueOf(value);
case FLOAT:
return Float.valueOf(value);
case INT:
return Integer.valueOf(value);
case LONG:
return Long.valueOf(value);
case SHORT:
return Short.valueOf(value);
case STRING:
return value;
default:
throw new IllegalStateException(primitiveCategory.toString());
}
} catch (Exception ignored) {
}
}
return null;
}
}
我正在尝试使用 Hive 中的 xml serde 从 xml 中提取时间戳。外部 table 创建链接到 hdfs 目录。目前,时间戳值在我的 table.
中显示为 null我在想时间戳需要投射吗?我不知道。 xml 信息的其余部分工作正常并显示在配置单元中。
输入文件是:
<example>
<date>2017-02-09 22:03:58<date>
</example>
Hive 创建脚本:
create external table example (
date timestamp
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.date"="/example/date/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'mypath'
TBLPROPERTIES (
"xmlinput.start"="<example>",
"xmlinput.end"="</example>"
);
似乎只支持 Java 原始类型。
查看 XmlUtils.java
文件中的 getPrimitiveValue
方法。
/**
* (c) Copyright IBM Corp. 2013. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.ibm.spss.hive.serde2.xml.processor;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
/**
* The XML utilities
*/
public class XmlUtils {
/**
* Private constructor
*/
private XmlUtils() {
}
/**
* Converts the string value to the java object for the given primitive category
*
* @param value
* the value
* @param primitiveCategory
* the primitive category
* @return the java object
*/
public static Object getPrimitiveValue(String value, PrimitiveCategory primitiveCategory) {
if (value != null) {
try {
switch (primitiveCategory) {
case BOOLEAN:
return Boolean.valueOf(value);
case BYTE:
return Byte.valueOf(value);
case DOUBLE:
return Double.valueOf(value);
case FLOAT:
return Float.valueOf(value);
case INT:
return Integer.valueOf(value);
case LONG:
return Long.valueOf(value);
case SHORT:
return Short.valueOf(value);
case STRING:
return value;
default:
throw new IllegalStateException(primitiveCategory.toString());
}
} catch (Exception ignored) {
}
}
return null;
}
}