apache pig 包的动态列
Dynamic columns for bag in apache pig
我是 Apache Pig 的新手。是否可以创建带有动态列的包?
以下是示例脚本。
A = LOAD 'student' USING PigStorage() AS (col1:chararray, col2:chararray, col3:chararray ....... );
B = FOREACH A GENERATE col1, col3; OR
B = FOREACH A GENERATE col2, col3, col4;
简而言之,获取列列表并创建包。这可能吗?如何?
我在这里需要的是获取动态列的方法。所以例如有人将我的脚本定义为 运行 on params='col1,col2,col4' 然后我的脚本应该能够解析这个字符串并使用它来获得所需的列。
A = LOAD 'student' using PigStorage(',');
by this relation A contains all columns which has student file.You can access these columns by using sequence number like [=10=],,
[=10=] --> First column
--> Second column
--> third column and so on
我建议在脚本中使用 MACRO 或 PARAMETER 变量,这里是 MACRO 的示例
define dividend_analysis (daily, year, daily_symbol, daily_open, daily_close)
returns analyzed {
divs = load '/user/data/NYSE_dividends'as (exchange:chararray, symbol:chararray, date:chararray, dividends:float);
divisthisyear = filter divs by date matches '.*$year.*';
dailythisyear = filter $daily by date matches '.*$year.*';
jnd = join divisthisyear by symbol, dailythisyear by $daily_symbol;
$analyzed = foreach jnd generate ,$daily_close - $daily_open; };
daily = load '/user/data/NYSE_daily'as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float, low:float, close:float, volume:int, adj_close:float);
results = dividend_analysis(daily, '2009', 'symbol', 'open', 'close');
我是 Apache Pig 的新手。是否可以创建带有动态列的包? 以下是示例脚本。
A = LOAD 'student' USING PigStorage() AS (col1:chararray, col2:chararray, col3:chararray ....... );
B = FOREACH A GENERATE col1, col3; OR
B = FOREACH A GENERATE col2, col3, col4;
简而言之,获取列列表并创建包。这可能吗?如何? 我在这里需要的是获取动态列的方法。所以例如有人将我的脚本定义为 运行 on params='col1,col2,col4' 然后我的脚本应该能够解析这个字符串并使用它来获得所需的列。
A = LOAD 'student' using PigStorage(',');
by this relation A contains all columns which has student file.You can access these columns by using sequence number like [=10=],,
[=10=] --> First column
--> Second column
--> third column and so on
我建议在脚本中使用 MACRO 或 PARAMETER 变量,这里是 MACRO 的示例
define dividend_analysis (daily, year, daily_symbol, daily_open, daily_close)
returns analyzed {
divs = load '/user/data/NYSE_dividends'as (exchange:chararray, symbol:chararray, date:chararray, dividends:float);
divisthisyear = filter divs by date matches '.*$year.*';
dailythisyear = filter $daily by date matches '.*$year.*';
jnd = join divisthisyear by symbol, dailythisyear by $daily_symbol;
$analyzed = foreach jnd generate ,$daily_close - $daily_open; };
daily = load '/user/data/NYSE_daily'as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float, low:float, close:float, volume:int, adj_close:float);
results = dividend_analysis(daily, '2009', 'symbol', 'open', 'close');