如何使用 HiveQL 在配置单元 table 中将 ngrams 数组字符串和测试频率作为单独的元素获取?
How do I get ngrams array string and estfrequency as seperate elements in a hive table using HiveQL?
我正在分析我自己的推文,我已使用 Hive JSON SerDE 将数据插入 Hive table。我想找出我的推文中所有两个单词短语的频率为 table。输出应该类似于:
phrase frequency
["the","room"] 1248.0
["a","boy"] 1039.0
["rt","to"] 1032.0
["to","ct"] 986.0
现在,我可以对所有单词短语执行此操作,并且得到的输出为:
phrase frequency
["the"] 1248.0
["a"] 1039.0
["rt"] 1032.0
["to"] 986.0
["you"] 828.0
对于单词短语输出,我的代码是:
create table ng(new_ar array<struct<ngram:array<string>,estfrequency:double>>);
INSERT OVERWRITE TABLE ng
SELECT context_ngrams(sentences(lower(text)),array(null),100) as word
FROM tweets;
create table wordFreq (ngram array<string>, estfrequency double);
INSERT OVERWRITE TABLE wordFreq
SELECT X.ngram, X.estfrequency
FROM ng LATERAL VIEW explode(new_ar) Z as X;
select * from wordFreq;
如何修改上面的代码以获得我想要的输出?
要将您的代码从 1-gram 更改为 2-gram,请将 array(null)
更改为 array(null,null)
。
下面的修改将在单独的列中给出这两个词。
您可以将它们连接起来
create table wordFreq (word1 string, word2 string, estfrequency double);
INSERT OVERWRITE TABLE wordFreq
SELECT X.ngram[0],X.ngram[1], X.estfrequency
FROM ng LATERAL VIEW explode(new_ar) Z as X;
我正在分析我自己的推文,我已使用 Hive JSON SerDE 将数据插入 Hive table。我想找出我的推文中所有两个单词短语的频率为 table。输出应该类似于:
phrase frequency
["the","room"] 1248.0
["a","boy"] 1039.0
["rt","to"] 1032.0
["to","ct"] 986.0
现在,我可以对所有单词短语执行此操作,并且得到的输出为:
phrase frequency
["the"] 1248.0
["a"] 1039.0
["rt"] 1032.0
["to"] 986.0
["you"] 828.0
对于单词短语输出,我的代码是:
create table ng(new_ar array<struct<ngram:array<string>,estfrequency:double>>);
INSERT OVERWRITE TABLE ng
SELECT context_ngrams(sentences(lower(text)),array(null),100) as word
FROM tweets;
create table wordFreq (ngram array<string>, estfrequency double);
INSERT OVERWRITE TABLE wordFreq
SELECT X.ngram, X.estfrequency
FROM ng LATERAL VIEW explode(new_ar) Z as X;
select * from wordFreq;
如何修改上面的代码以获得我想要的输出?
要将您的代码从 1-gram 更改为 2-gram,请将 array(null)
更改为 array(null,null)
。
下面的修改将在单独的列中给出这两个词。 您可以将它们连接起来
create table wordFreq (word1 string, word2 string, estfrequency double);
INSERT OVERWRITE TABLE wordFreq
SELECT X.ngram[0],X.ngram[1], X.estfrequency
FROM ng LATERAL VIEW explode(new_ar) Z as X;