我的 HiveUDF 中的错误
Bug in my HiveUDF
我正在尝试编写一个 Hive UDF,它检查 Hive table 中的列并将字符串与它连接起来。
我的 Hive table- cityTab 架构和数据:
Schema:
id int
name char(30)
rank int
Data:
1 NewYork 10
2 Amsterdam 30
我编写了以下 Hive UDF:
public class MyHiveUdf extends UDF {
private Text result = new Text();
public Text evaluate(Text text) {
if(text == null) {
return null;
} else {
String str = text.toString();
if(str.contains("NewYork")) {
result.set(text.toString().concat(" America"));
}
return result;
}
}
}
我添加了 jar,创建了一个临时函数并执行如下:
ADD jar /home/cloudera/Desktop/HiveStrCon.jar;
create temporary function strcon as 'com.hiveudf.strmnp.MyHiveUdf';
select strcon(name) from cityTab;
但我看到输出数据没有任何新字符串的连接:
OK
NewYork
Amsterdam
Time taken: 0.191 seconds, Fetched: 3 row(s)
谁能告诉我我在这里犯了什么错误。
我试过你的例子,我的例子运行良好,只是对代码做了一点小改动
public class MyHiveUdf extends UDF {
private Text result = new Text();
public Text evaluate(Text text) {
if(text == null) {
return null;
} else {
String str = text.toString();
if(str.contains("NewYork")) {
result.set(text.toString().concat(" America"));
return result;
}
return text;
}
}
hive> ADD jar /root/HiveStrCon.jar;
Added [/root/HiveStrCon.jar] to class path
Added resources: [/root/HiveStrCon.jar]
hive> create temporary function strcon as 'com.hiveudf.strmnp.MyHiveUdf';
OK
Time taken: 0.005 seconds
hive> select strcon(name) from cityTab;
Query ID = root_20170331132222_690e8d43-381c-4e40-a90b-368397c1df5b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1490796950103_0007, Tracking URL = http://mac127:8088/proxy/application_1490796950103_0007/
Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job -kill job_1490796950103_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-03-31 13:22:42,264 Stage-1 map = 0%, reduce = 0%
2017-03-31 13:22:50,720 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.14 sec
MapReduce Total cumulative CPU time: 2 seconds 140 msec
Ended Job = job_1490796950103_0007
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.14 sec HDFS Read: 3166 HDFS Write: 26 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 140 msec
OK
NewYork America
Amsterdam
Time taken: 19.788 seconds, Fetched: 2 row(s)
我正在尝试编写一个 Hive UDF,它检查 Hive table 中的列并将字符串与它连接起来。 我的 Hive table- cityTab 架构和数据:
Schema:
id int
name char(30)
rank int
Data:
1 NewYork 10
2 Amsterdam 30
我编写了以下 Hive UDF:
public class MyHiveUdf extends UDF {
private Text result = new Text();
public Text evaluate(Text text) {
if(text == null) {
return null;
} else {
String str = text.toString();
if(str.contains("NewYork")) {
result.set(text.toString().concat(" America"));
}
return result;
}
}
}
我添加了 jar,创建了一个临时函数并执行如下:
ADD jar /home/cloudera/Desktop/HiveStrCon.jar;
create temporary function strcon as 'com.hiveudf.strmnp.MyHiveUdf';
select strcon(name) from cityTab;
但我看到输出数据没有任何新字符串的连接:
OK
NewYork
Amsterdam
Time taken: 0.191 seconds, Fetched: 3 row(s)
谁能告诉我我在这里犯了什么错误。
我试过你的例子,我的例子运行良好,只是对代码做了一点小改动
public class MyHiveUdf extends UDF {
private Text result = new Text();
public Text evaluate(Text text) {
if(text == null) {
return null;
} else {
String str = text.toString();
if(str.contains("NewYork")) {
result.set(text.toString().concat(" America"));
return result;
}
return text;
}
}
hive> ADD jar /root/HiveStrCon.jar;
Added [/root/HiveStrCon.jar] to class path
Added resources: [/root/HiveStrCon.jar]
hive> create temporary function strcon as 'com.hiveudf.strmnp.MyHiveUdf';
OK
Time taken: 0.005 seconds
hive> select strcon(name) from cityTab;
Query ID = root_20170331132222_690e8d43-381c-4e40-a90b-368397c1df5b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1490796950103_0007, Tracking URL = http://mac127:8088/proxy/application_1490796950103_0007/
Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job -kill job_1490796950103_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-03-31 13:22:42,264 Stage-1 map = 0%, reduce = 0%
2017-03-31 13:22:50,720 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.14 sec
MapReduce Total cumulative CPU time: 2 seconds 140 msec
Ended Job = job_1490796950103_0007
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.14 sec HDFS Read: 3166 HDFS Write: 26 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 140 msec
OK
NewYork America
Amsterdam
Time taken: 19.788 seconds, Fetched: 2 row(s)