Pig Script 在加入和分组后合并行
Pig Script Merging Rows after join and group by
电影table:
id movie genre
1 ABC A|B|C
2 DEF D|A|F
有多个流派,由 |
分隔符分隔。
评分table:
user_id movie_id rating
1 1 3.5
1 2 4.5
结果:
我想要的结果是 user_id
+ 所有流派
user_id genres
1 (A|B|C|D|A|F)
代码:
genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by (user_id);
user1_data = foreach genre_data generate ratings::user_id, movie::genre;
您可以通过以下方式实现:
genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by user_id;
user_data = foreach genre_data {
genres = foreach genre_data generate movie::genre as genres;
generate group as user_id, BagToString(genres, '|');
};
电影table:
id movie genre
1 ABC A|B|C
2 DEF D|A|F
有多个流派,由 |
分隔符分隔。
评分table:
user_id movie_id rating
1 1 3.5
1 2 4.5
结果:
我想要的结果是 user_id
+ 所有流派
user_id genres
1 (A|B|C|D|A|F)
代码:
genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by (user_id);
user1_data = foreach genre_data generate ratings::user_id, movie::genre;
您可以通过以下方式实现:
genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by user_id;
user_data = foreach genre_data {
genres = foreach genre_data generate movie::genre as genres;
generate group as user_id, BagToString(genres, '|');
};