使用 Python 的 Hive 转换：无法初始化自定义脚本

Question

我正在尝试通过将 Python 脚本作为映射器来测试 Hive TRANSFORM。我的蜂巢脚本是：

add file  /full/path/to/mapper.py;

set mapred.job.queue.name=queue_name;

use my_database;

select transform(s.year, s.month, s.day, s.hour) 
using 'mapper.py' 
from my_table s limit 10;

我的 Python 映射器脚本只是试图回应输入：

#!/usr/local/bin/python
import sys
for line in sys.stdin:
    print line

我已经尝试运行使用以下组合：

删除配置单元脚本中的 add file ... 并在 select ... 语句

mapper.py

保留 add file ... 和映射器的完整路径：/path/to/mapper.py
保持映射器的add file ...和相对路径：./mapper.py
尝试使用 AS 子句 (using 'mapper.py' as line)

到目前为止，上述所有尝试都导致 Hive 报告它无法初始化我的自定义脚本：

FAILED: Execution Error, return code 20000 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Unable to initialize custom script.

我无法理解其本质 'initialization.' Hive 是否无法理解

找到我的脚本（即路径问题）？
找到 python 可执行文件（即 #! shebang）

我正在关注 Hive tutorial 中的 "Custom map/reduce scripts"。

Answer 1

通过将我的 select... 语句修改为

来解决它

add file  /full/path/to/mapper.py;
select transform(s.year, s.month, s.day, s.hour) 
using ' python mapper.py' --<--- This line changed
from my_table s limit 10;

Reference post

使用 Python 的 Hive 转换：无法初始化自定义脚本

Hive transform using Python: Unable to initialize custom script

python

hadoop

hive