如何使用 HCatalog Java API 获取行和字段分隔符
How to get row and field delimiter using HCatalog Java API
POM 依赖项:
<dependency>
<groupId>org.apache.hive.hcatalog</groupId>
<artifactId>hive-webhcat-java-client</artifactId>
<version>1.2.1</version>
</dependency>
我能够获取列、分区列、输入文件格式等
有用代码:
HiveConf hcatConf = new HiveConf();
hcatConf.setVar(HiveConf.ConfVars.METASTOREURIS, connectionUri);
hcatConf.set("hive.metastore.local", "false");
hcatConf.setIntVar(HiveConf.ConfVars.METASTORETHRIFTCONNECTIONRETRIES, THRIFT_CONNECTION_RETRY);
hcatConf.set(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true");
hcatConf.set(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK.varname, HCatSemanticAnalyzer.class.getName());
hcatConf.set(HiveConf.ConfVars.PREEXECHOOKS.varname, "");
hcatConf.set(HiveConf.ConfVars.POSTEXECHOOKS.varname, "");
hcatConf.setTimeVar(HiveConf.ConfVars.METASTORE_CLIENT_SOCKET_TIMEOUT, TIME_OUT, TimeUnit.MILLISECONDS);
HCatClient client = null;
HCatTable hTable = null;
try {
client = HCatClient.create(hcatConf);
hTable = client.getTable(databaseName, tableName);
System.out.println(hTable.getInputFileFormat());
System.out.println(hTable.getOutputFileFormat());
System.out.println(hTable.getSerdeLib());
} catch (HCatException hCatEx) {
LOG.error("Not able to connect to hive. Caused By;", hCatEx);
}
如何获取文本表格的行和字段分隔符?
根据 getSerdeParams()、
的 Javadoc
public Map<String,String> getSerdeParams()
- Returns parameters such as field delimiter,etc.
但在我的例子中,我在这张地图中只有 1 个条目
{serialization.format=1}
如果我创建一个 table:
create table tbl1 (c1 int) stored as textfile
当我 运行 show create table tbl1
:
CREATE TABLE `tbl1`(
`c1` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://localhost:8020/apps/hive/warehouse/dev.db/tbl1'
TBLPROPERTIES (
'transient_lastDdlTime'='1477067078')
未显示默认分隔符。
当我创建带有分隔符的 table 时:
create table tbl2 (c1 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\," LINES TERMINATED BY "\n" stored as textfile;
当我 运行 show create table tbl2
:
CREATE TABLE `tbl2`(
`c1` int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://localhost:8020/apps/hive/warehouse/dev.db/tbl2'
TBLPROPERTIES (
'transient_lastDdlTime'='1477067160')
在第二种情况下,我明确提到了分隔符。因此,getSerdeParams()
返回了所需的值。
POM 依赖项:
<dependency>
<groupId>org.apache.hive.hcatalog</groupId>
<artifactId>hive-webhcat-java-client</artifactId>
<version>1.2.1</version>
</dependency>
我能够获取列、分区列、输入文件格式等
有用代码:
HiveConf hcatConf = new HiveConf();
hcatConf.setVar(HiveConf.ConfVars.METASTOREURIS, connectionUri);
hcatConf.set("hive.metastore.local", "false");
hcatConf.setIntVar(HiveConf.ConfVars.METASTORETHRIFTCONNECTIONRETRIES, THRIFT_CONNECTION_RETRY);
hcatConf.set(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "true");
hcatConf.set(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK.varname, HCatSemanticAnalyzer.class.getName());
hcatConf.set(HiveConf.ConfVars.PREEXECHOOKS.varname, "");
hcatConf.set(HiveConf.ConfVars.POSTEXECHOOKS.varname, "");
hcatConf.setTimeVar(HiveConf.ConfVars.METASTORE_CLIENT_SOCKET_TIMEOUT, TIME_OUT, TimeUnit.MILLISECONDS);
HCatClient client = null;
HCatTable hTable = null;
try {
client = HCatClient.create(hcatConf);
hTable = client.getTable(databaseName, tableName);
System.out.println(hTable.getInputFileFormat());
System.out.println(hTable.getOutputFileFormat());
System.out.println(hTable.getSerdeLib());
} catch (HCatException hCatEx) {
LOG.error("Not able to connect to hive. Caused By;", hCatEx);
}
如何获取文本表格的行和字段分隔符?
根据 getSerdeParams()、
的 Javadoc
public Map<String,String> getSerdeParams()
- Returns parameters such as field delimiter,etc.
但在我的例子中,我在这张地图中只有 1 个条目
{serialization.format=1}
如果我创建一个 table:
create table tbl1 (c1 int) stored as textfile
当我 运行 show create table tbl1
:
CREATE TABLE `tbl1`(
`c1` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://localhost:8020/apps/hive/warehouse/dev.db/tbl1'
TBLPROPERTIES (
'transient_lastDdlTime'='1477067078')
未显示默认分隔符。
当我创建带有分隔符的 table 时:
create table tbl2 (c1 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\," LINES TERMINATED BY "\n" stored as textfile;
当我 运行 show create table tbl2
:
CREATE TABLE `tbl2`(
`c1` int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://localhost:8020/apps/hive/warehouse/dev.db/tbl2'
TBLPROPERTIES (
'transient_lastDdlTime'='1477067160')
在第二种情况下,我明确提到了分隔符。因此,getSerdeParams()
返回了所需的值。