在 Hive 中,当我从 csv 文件加载数据时,我只得到列的一部分,而不是全部
In Hive, when I load data from csv file, I only get a part of columns, not the whole things
这是我的数据源中的列
BibNum
Title
Author
ISBN
PublicationYear
Publisher
Subjects
ItemType
ItemCollection
FloatingItem
ItemLocation
ReportDate
ItemCount
我只得到 publisher
列的值。
我上传了一个截图,如果你知道原因和方法可以修复,请告诉我,将不胜感激:
下面是第一行的真实值(我用//标记分隔来表示每一列)
3011076//
A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield, Frederick Gardner, Megan Petasky, and Allen Tam. //
O'Ryan, Ellie //
1481425730, 1481425749, 9781481425735, 9781481425742 //
2014 //
Simon Spotlight, Musicians Fiction, Bullfighters Fiction, Best friends Fiction, Friendship Fiction, Adventure and adventurers Fiction //
jcbk //
ncrdr //
Floating //
qna //
09/01/2017 //
1
这是第二行的实际值
2248846 //
Naruto. Vol. 1, Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]. //
Kishimoto, Masashi, 1974- //
1569319006 //
2003, c1999. //
Viz, Ninja Japan Comic books strips etc, Comic books strips etc Japan Translations into English, Graphic novels //
acbk//
nycomic//
NA//
lcy//
09/01/2017//
1
hive> select * from timesheet limit 3;
OK
NULL Title Author ISBN PublicationYear Publisher Subjects ItemType ItemCollection FloatingItem ItemLocation ReportDate ItemCount
3011076 "A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield Frederick Gardner Megan Petasky and Allen Tam." "O'Ryan Ellie" "1481425730 1481425749 9781481425735 9781481425742" 2014. "Simon Spotlight
2248846 "Naruto. Vol. 1 Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]." "Kishimoto Masashi 1974-" 1569319006 "2003 c1999." "Viz " "Ninja Japan Comic books strips etc Comic books strips etc Japan Translations into English
Time taken: 0.149 seconds
hive> desc timesheet
> ;
OK
bibnum bigint
title string
author string
isbn string
publication string
publisher string
subjects string
itemtype string
itemcollection string
floatingitem string
itemlocation string
reportdate string
itemcount string
Time taken: 0.21 seconds
BibNum、Title、Author、ISBN、PublicationYear、Publisher、Subjects、ItemType、ItemCollection、FloatingItem、ItemLocation、ReportDate、ItemCount |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |
|
3011076,"A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield, Frederick Gardner, Megan Petasky, and Allen Tam.","O'Ryan, Ellie","1481425730, 1481425749, 9781481425735, 9781481425742",2014.,"Simon Spotlight,","Musicians Fiction, Bullfighters Fiction, Best friends Fiction, Friendship Fiction, Adventure and adventurers Fiction",jcbk,ncrdr,浮动, qna,09/01/2017,1 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |
所以 Apache Hive 本身无法处理像这样的 CSV 数据,但是使用 SerDe (Serializer/Deserializer) 它可以帮助解决这个问题
hive v0.14+ 内置了 serde,默认分隔符是 ,
所以对于你的 CSV 这应该可以工作
create table table_name(column names data types..)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
stored as textfile;
and load data inpath '/path/'
into table table_name
如果任何列中有未转义的引号,您将不得不手动进入并确定我们的哪些列是哪些...
由于 csv 文件以逗号分隔,如果您将一列指定为字符串,则整行都将加载到该列中。因此,在创建 table 时,您可以指定行值由逗号分隔。
create table table_name (
....
) row format delimited fields terminated by ',' lines terminated by '\n';
然后使用
加载csv文件
load data local inpath path_to_file to table table_name;
希望这对您有所帮助:)
这是我的数据源中的列
BibNum
Title
Author
ISBN
PublicationYear
Publisher
Subjects
ItemType
ItemCollection
FloatingItem
ItemLocation
ReportDate
ItemCount
我只得到 publisher
列的值。
我上传了一个截图,如果你知道原因和方法可以修复,请告诉我,将不胜感激:
下面是第一行的真实值(我用//标记分隔来表示每一列)
3011076//
A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield, Frederick Gardner, Megan Petasky, and Allen Tam. //
O'Ryan, Ellie //
1481425730, 1481425749, 9781481425735, 9781481425742 //
2014 //
Simon Spotlight, Musicians Fiction, Bullfighters Fiction, Best friends Fiction, Friendship Fiction, Adventure and adventurers Fiction //
jcbk //
ncrdr //
Floating //
qna //
09/01/2017 //
1
这是第二行的实际值
2248846 //
Naruto. Vol. 1, Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]. //
Kishimoto, Masashi, 1974- //
1569319006 //
2003, c1999. //
Viz, Ninja Japan Comic books strips etc, Comic books strips etc Japan Translations into English, Graphic novels //
acbk//
nycomic//
NA//
lcy//
09/01/2017//
1
hive> select * from timesheet limit 3;
OK
NULL Title Author ISBN PublicationYear Publisher Subjects ItemType ItemCollection FloatingItem ItemLocation ReportDate ItemCount
3011076 "A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield Frederick Gardner Megan Petasky and Allen Tam." "O'Ryan Ellie" "1481425730 1481425749 9781481425735 9781481425742" 2014. "Simon Spotlight
2248846 "Naruto. Vol. 1 Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]." "Kishimoto Masashi 1974-" 1569319006 "2003 c1999." "Viz " "Ninja Japan Comic books strips etc Comic books strips etc Japan Translations into English
Time taken: 0.149 seconds
hive> desc timesheet
> ;
OK
bibnum bigint
title string
author string
isbn string
publication string
publisher string
subjects string
itemtype string
itemcollection string
floatingitem string
itemlocation string
reportdate string
itemcount string
Time taken: 0.21 seconds
BibNum、Title、Author、ISBN、PublicationYear、Publisher、Subjects、ItemType、ItemCollection、FloatingItem、ItemLocation、ReportDate、ItemCount |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 | |
3011076,"A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield, Frederick Gardner, Megan Petasky, and Allen Tam.","O'Ryan, Ellie","1481425730, 1481425749, 9781481425735, 9781481425742",2014.,"Simon Spotlight,","Musicians Fiction, Bullfighters Fiction, Best friends Fiction, Friendship Fiction, Adventure and adventurers Fiction",jcbk,ncrdr,浮动, qna,09/01/2017,1 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |空 |
所以 Apache Hive 本身无法处理像这样的 CSV 数据,但是使用 SerDe (Serializer/Deserializer) 它可以帮助解决这个问题
hive v0.14+ 内置了 serde,默认分隔符是 ,
所以对于你的 CSV 这应该可以工作
create table table_name(column names data types..)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
stored as textfile;
and load data inpath '/path/'
into table table_name
如果任何列中有未转义的引号,您将不得不手动进入并确定我们的哪些列是哪些...
由于 csv 文件以逗号分隔,如果您将一列指定为字符串,则整行都将加载到该列中。因此,在创建 table 时,您可以指定行值由逗号分隔。
create table table_name (
....
) row format delimited fields terminated by ',' lines terminated by '\n';
然后使用
加载csv文件load data local inpath path_to_file to table table_name;
希望这对您有所帮助:)