如何从猪中的变量中过滤出第一行

How to filter out first line from an variable in pig

我将一个 cvs 文件导入到如下变量:

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(',');

下面是前 3 行的输出:

tmp = limit basketball_players 3;
dump tmp

("playerID","year","stint","tmID","lgID","GP","GS","minutes","points","oRebounds","dRebounds","rebounds","assists","steals","blocks","turnovers","PF","fgAttempted","fgMade","ftAttempted","ftMade","threeAttempted","threeMade","PostGP","PostGS","PostMinutes","PostPoints","PostoRebounds","PostdRebounds","PostRebounds","PostAssists","PostSteals","PostBlocks","PostTurnovers","PostPF","PostfgAttempted","PostfgMade","PostftAttempted","PostftMade","PostthreeAttempted","PostthreeMade","note")
("abramjo01","1946","1","PIT","NBA","47","0","0","527","0","0","0","35","0","0","0","161","834","202","178","123","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0",)
("aubucch01","1946","1","DTF","NBA","30","0","0","65","0","0","0","20","0","0","0","46","91","23","35","19","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0",)

可以看到第一行是table的header。我使用下面的命令过滤掉第一行,但它不起作用。

grunt> players_raw = filter basketball_players by  > 0;
2017-05-06 11:03:36,389 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 6 time(s).

当我转储 players_raw 的值时,它 returns 为空。如何从变量中过滤出第一行?

使用 RANK 生成一个新列,该列将向 dataset.Use 该列添加行号以过滤第一行。

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(',');
ranked = rank basketball_players;
basketball_players_without_header = Filter ranked by (rank_basketball_players > 1);
DUMP basketball_players_without_header;

另一种方法

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(',');
basketball_players_without_header = Filter basketball_players by ([=11=] matches '.*playerID.*');
DUMP basketball_players_without_header;