Hive table 以 Parquet 格式加载
Hive table load in Parquet format
我有以下输入文件。我需要以 orc 和 parquet 格式将此文件加载到 hive table 中。
productID,productCode,name,quantity,price,supplierid
1001,PEN,钢笔红,5000,1.23,501
1002,PEN,钢笔蓝,8000,1.25,501
我把我的代码贴在了底部。我能够在 orc hive table 中成功创建和加载,但不能在镶木地板中创建和加载。
创建并加载 parquet table 后,当我查询时,我只看到所有字段的 NULL 值。我错过了什么吗?
val productsupplies = sc.textFile("/user/cloudera/product.csv")
val productfirst = productsupplies.first
val product = productsupplies.filter(f => f != productfirst).map(x => { val a = x.split(",")
(a(0).toInt,a(1),a(2),a(3),a(4).toFloat,a(5))
}).toDF("productID","productCode","name","quantity","price","supplierid")
product.write.orc("/user/cloudera/productsupp.orc")
product.write.parquet("/user/cloudera/productsupp.parquet")
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
hc.sql("create table product_supp_orc ( " +
"product_id int, " +
"product_code string, " +
"product_name string, " +
"product_quatity string, " +
"product_price float, " +
"product_supplier_id string) stored as orc " +
"location \"/user/cloudera/productsupp.orc \" ")
hc.sql("create table product_supp_parquet ( " +
"product_id int, " +
"product_code string, " +
"product_name string, " +
"product_quatity string, " +
"product_price float, " +
"product_supplier_id string) stored as parquet " +
"location \"/user/cloudera/productsupp.parquet\" ")
hc.sql("select * from product_supp_parquet")
尝试:
hc.sql("create table product_supp_parquet ( " +
"productid int, " +
"productcode string, " +
"name string, " +
"quantity string, " +
"price float, " +
"supplierid string) stored as parquet " +
"location \"/user/cloudera/products.parquet\" ")
基本上,名称必须与您在上传文件中使用的名称相同。
我有以下输入文件。我需要以 orc 和 parquet 格式将此文件加载到 hive table 中。
productID,productCode,name,quantity,price,supplierid 1001,PEN,钢笔红,5000,1.23,501 1002,PEN,钢笔蓝,8000,1.25,501
我把我的代码贴在了底部。我能够在 orc hive table 中成功创建和加载,但不能在镶木地板中创建和加载。
创建并加载 parquet table 后,当我查询时,我只看到所有字段的 NULL 值。我错过了什么吗?
val productsupplies = sc.textFile("/user/cloudera/product.csv")
val productfirst = productsupplies.first
val product = productsupplies.filter(f => f != productfirst).map(x => { val a = x.split(",")
(a(0).toInt,a(1),a(2),a(3),a(4).toFloat,a(5))
}).toDF("productID","productCode","name","quantity","price","supplierid")
product.write.orc("/user/cloudera/productsupp.orc")
product.write.parquet("/user/cloudera/productsupp.parquet")
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
hc.sql("create table product_supp_orc ( " +
"product_id int, " +
"product_code string, " +
"product_name string, " +
"product_quatity string, " +
"product_price float, " +
"product_supplier_id string) stored as orc " +
"location \"/user/cloudera/productsupp.orc \" ")
hc.sql("create table product_supp_parquet ( " +
"product_id int, " +
"product_code string, " +
"product_name string, " +
"product_quatity string, " +
"product_price float, " +
"product_supplier_id string) stored as parquet " +
"location \"/user/cloudera/productsupp.parquet\" ")
hc.sql("select * from product_supp_parquet")
尝试:
hc.sql("create table product_supp_parquet ( " +
"productid int, " +
"productcode string, " +
"name string, " +
"quantity string, " +
"price float, " +
"supplierid string) stored as parquet " +
"location \"/user/cloudera/products.parquet\" ")
基本上,名称必须与您在上传文件中使用的名称相同。