查询支持 avro 的配置单元时出错 table:java.lang.IllegalArgumentException

Error when querying avro-backed hive table: java.lang.IllegalArgumentException

我正在尝试根据从 BigQuery 中的原始 google 分析数据导出的 avro 文件在 azure HDInsight 上创建配置单元 table。

似乎有效。我可以创建 table,并且当我 运行 描述时没有错误。但是当我尝试 select 结果时,即使我 select 只有两个非嵌套列,我也会收到一个错误:"java.lang.IllegalArgumentException".

以下是我创建 table 的方式:

DROP TABLE IF EXISTS ga_sessions_20150106;
CREATE EXTERNAL TABLE IF NOT EXISTS ga_sessions_20150106
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/upload/ga_sessions'
TBLPROPERTIES ('avro.schema.url'='/upload/ga_sessions.avsc');
describe ga_sessions_20150106;

这是 avro 架构:

{"type":"record","name":"root","fields":[{"name":"visitorId","type":["long","null"]},{"name":"visitNumber","type":["long","null"]},{"name":"visitId","type":["long","null"]},{"name":"visitStartTime","type":["long","null"]},{"name":"date","type":["string","null"]},{"name":"totals","type":[{"type":"record","name":"totals","fields":[{"name":"visits","type":["long","null"]},{"name":"hits","type":["long","null"]},{"name":"pageviews","type":["long","null"]},{"name":"timeOnSite","type":["long","null"]},{"name":"bounces","type":["long","null"]},{"name":"transactions","type":["long","null"]},{"name":"transactionRevenue","type":["long","null"]},{"name":"newVisits","type":["long","null"]},{"name":"screenviews","type":["long","null"]},{"name":"uniqueScreenviews","type":["long","null"]},{"name":"timeOnScreen","type":["long","null"]},{"name":"totalTransactionRevenue","type":["long","null"]}]},"null"]},{"name":"trafficSource","type":[{"type":"record","name":"trafficSource","fields":[{"name":"referralPath","type":["string","null"]},{"name":"campaign","type":["string","null"]},{"name":"source","type":["string","null"]},{"name":"medium","type":["string","null"]},{"name":"keyword","type":["string","null"]},{"name":"adContent","type":["string","null"]},{"name":"adwordsClickInfo","type":[{"type":"record","name":"adwordsClickInfo","fields":[{"name":"campaignId","type":["long","null"]},{"name":"adGroupId","type":["long","null"]},{"name":"creativeId","type":["long","null"]},{"name":"criteriaId","type":["long","null"]},{"name":"page","type":["long","null"]},{"name":"slot","type":["string","null"]},{"name":"criteriaParameters","type":["string","null"]},{"name":"gclId","type":["string","null"]},{"name":"customerId","type":["long","null"]},{"name":"adNetworkType","type":["string","null"]},{"name":"targetingCriteria","type":[{"type":"record","name":"targetingCriteria","fields":[{"name":"boomUserlistId","type":["long","null"]}]},"null"]}]},"null"]}]},"null"]},{"name":"device","type":[{"type":"record","name":"device","fields":[{"name":"browser","type":["string","null"]},{"name":"browserVersion","type":["string","null"]},{"name":"operatingSystem","type":["string","null"]},{"name":"operatingSystemVersion","type":["string","null"]},{"name":"isMobile","type":["boolean","null"]},{"name":"mobileDeviceBranding","type":["string","null"]},{"name":"flashVersion","type":["string","null"]},{"name":"javaEnabled","type":["boolean","null"]},{"name":"language","type":["string","null"]},{"name":"screenColors","type":["string","null"]},{"name":"screenResolution","type":["string","null"]},{"name":"deviceCategory","type":["string","null"]}]},"null"]},{"name":"geoNetwork","type":[{"type":"record","name":"geoNetwork","fields":[{"name":"continent","type":["string","null"]},{"name":"subContinent","type":["string","null"]},{"name":"country","type":["string","null"]},{"name":"region","type":["string","null"]},{"name":"metro","type":["string","null"]}]},"null"]},{"name":"customDimensions","type":{"type":"array","items":{"type":"record","name":"customDimensions","fields":[{"name":"index","type":["long","null"]},{"name":"value","type":["string","null"]}]}}},{"name":"hits","type":{"type":"array","items":{"type":"record","name":"hits","fields":[{"name":"hitNumber","type":["long","null"]},{"name":"time","type":["long","null"]},{"name":"hour","type":["long","null"]},{"name":"minute","type":["long","null"]},{"name":"isSecure","type":["boolean","null"]},{"name":"isInteraction","type":["boolean","null"]},{"name":"isEntrance","type":["boolean","null"]},{"name":"isExit","type":["boolean","null"]},{"name":"referer","type":["string","null"]},{"name":"page","type":[{"type":"record","name":"page","fields":[{"name":"pagePath","type":["string","null"]},{"name":"hostname","type":["string","null"]},{"name":"pageTitle","type":["string","null"]},{"name":"searchKeyword","type":["string","null"]},{"name":"searchCategory","type":["string","null"]}]},"null"]},{"name":"transaction","type":[{"type":"record","name":"transaction","fields":[{"name":"transactionId","type":["string","null"]},{"name":"transactionRevenue","type":["long","null"]},{"name":"transactionTax","type":["long","null"]},{"name":"transactionShipping","type":["long","null"]},{"name":"affiliation","type":["string","null"]},{"name":"currencyCode","type":["string","null"]},{"name":"localTransactionRevenue","type":["long","null"]},{"name":"localTransactionTax","type":["long","null"]},{"name":"localTransactionShipping","type":["long","null"]},{"name":"transactionCoupon","type":["string","null"]}]},"null"]},{"name":"item","type":[{"type":"record","name":"item","fields":[{"name":"transactionId","type":["string","null"]},{"name":"productName","type":["string","null"]},{"name":"productCategory","type":["string","null"]},{"name":"productSku","type":["string","null"]},{"name":"itemQuantity","type":["long","null"]},{"name":"itemRevenue","type":["long","null"]},{"name":"currencyCode","type":["string","null"]},{"name":"localItemRevenue","type":["long","null"]}]},"null"]},{"name":"contentInfo","type":[{"type":"record","name":"contentInfo","fields":[{"name":"contentDescription","type":["string","null"]}]},"null"]},{"name":"appInfo","type":[{"type":"record","name":"appInfo","fields":[{"name":"name","type":["string","null"]},{"name":"version","type":["string","null"]},{"name":"id","type":["string","null"]},{"name":"installerId","type":["string","null"]},{"name":"appInstallerId","type":["string","null"]},{"name":"appName","type":["string","null"]},{"name":"appVersion","type":["string","null"]},{"name":"appId","type":["string","null"]},{"name":"screenName","type":["string","null"]},{"name":"landingScreenName","type":["string","null"]},{"name":"exitScreenName","type":["string","null"]},{"name":"screenDepth","type":["string","null"]}]},"null"]},{"name":"exceptionInfo","type":[{"type":"record","name":"exceptionInfo","fields":[{"name":"description","type":["string","null"]},{"name":"isFatal","type":["boolean","null"]}]},"null"]},{"name":"eventInfo","type":[{"type":"record","name":"eventInfo","fields":[{"name":"eventCategory","type":["string","null"]},{"name":"eventAction","type":["string","null"]},{"name":"eventLabel","type":["string","null"]},{"name":"eventValue","type":["long","null"]}]},"null"]},{"name":"product","type":{"type":"array","items":{"type":"record","name":"product","fields":[{"name":"productSKU","type":["string","null"]},{"name":"v2ProductName","type":["string","null"]},{"name":"v2ProductCategory","type":["string","null"]},{"name":"productVariant","type":["string","null"]},{"name":"productBrand","type":["string","null"]},{"name":"productRevenue","type":["long","null"]},{"name":"localProductRevenue","type":["long","null"]},{"name":"productPrice","type":["long","null"]},{"name":"localProductPrice","type":["long","null"]},{"name":"productQuantity","type":["long","null"]},{"name":"productRefundAmount","type":["long","null"]},{"name":"localProductRefundAmount","type":["long","null"]},{"name":"isImpression","type":["boolean","null"]},{"name":"customDimensions","type":{"type":"array","items":"customDimensions"}},{"name":"customMetrics","type":{"type":"array","items":{"type":"record","name":"customMetrics","fields":[{"name":"index","type":["long","null"]},{"name":"value","type":["long","null"]}]}}}]}}},{"name":"promotion","type":{"type":"array","items":{"type":"record","name":"promotion","fields":[{"name":"promoId","type":["string","null"]},{"name":"promoName","type":["string","null"]},{"name":"promoCreative","type":["string","null"]},{"name":"promoPosition","type":["string","null"]}]}}},{"name":"promotionActionInfo","type":[{"type":"record","name":"promotionActionInfo","fields":[{"name":"promoIsView","type":["boolean","null"]},{"name":"promoIsClick","type":["boolean","null"]}]},"null"]},{"name":"refund","type":[{"type":"record","name":"refund","fields":[{"name":"refundAmount","type":["long","null"]},{"name":"localRefundAmount","type":["long","null"]}]},"null"]},{"name":"eCommerceAction","type":[{"type":"record","name":"eCommerceAction","fields":[{"name":"action_type","type":["string","null"]},{"name":"step","type":["long","null"]},{"name":"option","type":["string","null"]}]},"null"]},{"name":"experiment","type":{"type":"array","items":{"type":"record","name":"experiment","fields":[{"name":"experimentId","type":["string","null"]},{"name":"combination","type":["string","null"]}]}}},{"name":"customVariables","type":{"type":"array","items":{"type":"record","name":"customVariables","fields":[{"name":"index","type":["long","null"]},{"name":"customVarName","type":["string","null"]},{"name":"customVarValue","type":["string","null"]}]}}},{"name":"customDimensions","type":{"type":"array","items":"customDimensions"}},{"name":"customMetrics","type":{"type":"array","items":"customMetrics"}},{"name":"type","type":["string","null"]},{"name":"social","type":[{"type":"record","name":"social","fields":[{"name":"socialInteractionNetwork","type":["string","null"]},{"name":"socialInteractionAction","type":["string","null"]}]},"null"]}]}}},{"name":"fullVisitorId","type":["string","null"]},{"name":"userId","type":["string","null"]}]}

这是 DESCRIBE 返回的内容:

visitorid               bigint                  from deserializer   
visitnumber             bigint                  from deserializer   
visitid                 bigint                  from deserializer   
visitstarttime          bigint                  from deserializer   
date                    string                  from deserializer   
totals                  struct<visits:bigint,hits:bigint,pageviews:bigint,timeonsite:bigint,bounces:bigint,transactions:bigint,transactionrevenue:bigint,newvisits:bigint,screenviews:bigint,uniquescreenviews:bigint,timeonscreen:bigint,totaltransactionrevenue:bigint>   from deserializer   
trafficsource           struct<referralpath:string,campaign:string,source:string,medium:string,keyword:string,adcontent:string,adwordsclickinfo:struct<campaignid:bigint,adgroupid:bigint,creativeid:bigint,criteriaid:bigint,page:bigint,slot:string,criteriaparameters:string,gclid:string,customerid:bigint,adnetworktype:string,targetingcriteria:struct<boomuserlistid:bigint>>>   from deserializer   
device                  struct<browser:string,browserversion:string,operatingsystem:string,operatingsystemversion:string,ismobile:boolean,mobiledevicebranding:string,flashversion:string,javaenabled:boolean,language:string,screencolors:string,screenresolution:string,devicecategory:string>    from deserializer   
geonetwork              struct<continent:string,subcontinent:string,country:string,region:string,metro:string>  from deserializer   
customdimensions        array<struct<index:bigint,value:string>>    from deserializer   
hits                    array<struct<hitnumber:bigint,time:bigint,hour:bigint,minute:bigint,issecure:boolean,isinteraction:boolean,isentrance:boolean,isexit:boolean,referer:string,page:struct<pagepath:string,hostname:string,pagetitle:string,searchkeyword:string,searchcategory:string>,transaction:struct<transactionid:string,transactionrevenue:bigint,transactiontax:bigint,transactionshipping:bigint,affiliation:string,currencycode:string,localtransactionrevenue:bigint,localtransactiontax:bigint,localtransactionshipping:bigint,transactioncoupon:string>,item:struct<transactionid:string,productname:string,productcategory:string,productsku:string,itemquantity:bigint,itemrevenue:bigint,currencycode:string,localitemrevenue:bigint>,contentinfo:struct<contentdescription:string>,appinfo:struct<name:string,version:string,id:string,installerid:string,appinstallerid:string,appname:string,appversion:string,appid:string,screenname:string,landingscreenname:string,exitscreenname:string,screendepth:string>,exceptioninfo:struct<description:string,isfatal:boolean>,eventinfo:struct<eventcategory:string,eventaction:string,eventlabel:string,eventvalue:bigint>,product:array<struct<productsku:string,v2productname:string,v2productcategory:string,productvariant:string,productbrand:string,productrevenue:bigint,localproductrevenue:bigint,productprice:bigint,localproductprice:bigint,productquantity:bigint,productrefundamount:bigint,localproductrefundamount:bigint,isimpression:boolean,customdimensions:array<struct<index:bigint,value:string>>,custommetrics:array<struct<index:bigint,value:bigint>>>>,promotion:array<struct<promoid:string,promoname:string,promocreative:string,promoposition:string>>,promotionactioninfo:struct<promoisview:boolean,promoisclick:boolean>,refund:struct<refundamount:bigint,localrefundamount:bigint>,ecommerceaction:struct<action_type:string,step:bigint,option:string>,experiment:array<struct<experimentid:string,combination:string>>,customvariables:array<struct<index:bigint,customvarname:string,customvarvalue:string>>,customdimensions:array<struct<index:bigint,value:string>>,custommetrics:array<struct<index:bigint,value:bigint>>,type:string,social:struct<socialinteractionnetwork:string,socialinteractionaction:string>>>   from deserializer   
fullvisitorid           string                  from deserializer   
userid                  string                  from deserializer   

错误(如果需要,我可以 post 更多日志。它在“15 more”之后没有更多详细信息,但您可以看到之前发生的事情。):

Caused by: java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
    at org.apache.avro.io.BinaryDecoder.readBytes(BinaryDecoder.java:288)
    at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:112)
    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
    at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:81)
    at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:498)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:588)
    ... 15 more

好的 -- 此问题已解决。

问题是,当我使用 python 客户端从 google 云存储下载文件时,当我需要使用二进制模式时,我以文本模式(默认)将其写入文件。

我re-downloaded它,re-uploaded它,它起作用了。