从 pig latin 文件中读取元组
Read tuple from file in pig latin
这是来自 https://pig.apache.org/docs/r0.17.0/basic.html
的示例
cat data;
(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)
A = LOAD 'data' AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
DUMP A;
((3,8,9),(4,5,6))
((1,4,7),(3,7,5))
((2,5,8),(9,5,8))
我在 maria_dev 中创建了一个具有相同日期的 tp.txt(即
(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)
)
并阅读:
tp = LOAD 'tp.txt' as (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
但是当我在 grunt 中 运行 DUMP X 时,我得到以下输出:
((3,8,9),)
((1,4,7),)
((2,5,8),)
我做错了什么?
加载语句默认假定您的字段是制表符分隔的。您似乎在文本文件中使用了空格。在不更改文件的情况下,您可以执行以下操作:
tp = LOAD 'tp.txt' USING PigStorage(' ') AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
或者您可以用制表符替换文本文件中的空格,并保持加载语句不变。
这是来自 https://pig.apache.org/docs/r0.17.0/basic.html
的示例cat data;
(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)
A = LOAD 'data' AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
DUMP A;
((3,8,9),(4,5,6))
((1,4,7),(3,7,5))
((2,5,8),(9,5,8))
我在 maria_dev 中创建了一个具有相同日期的 tp.txt(即
(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)
) 并阅读:
tp = LOAD 'tp.txt' as (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
但是当我在 grunt 中 运行 DUMP X 时,我得到以下输出:
((3,8,9),)
((1,4,7),)
((2,5,8),)
我做错了什么?
加载语句默认假定您的字段是制表符分隔的。您似乎在文本文件中使用了空格。在不更改文件的情况下,您可以执行以下操作:
tp = LOAD 'tp.txt' USING PigStorage(' ') AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));
或者您可以用制表符替换文本文件中的空格,并保持加载语句不变。