Pig Script 在加入和分组后合并行

Pig Script Merging Rows after join and group by

电影table:

id  movie  genre
1   ABC    A|B|C
2   DEF    D|A|F

有多个流派,由 | 分隔符分隔。

评分table:

user_id  movie_id  rating
1        1         3.5
1        2         4.5

结果:

我想要的结果是 user_id + 所有流派

user_id  genres
1        (A|B|C|D|A|F)

代码:

genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by (user_id);
user1_data = foreach genre_data generate ratings::user_id, movie::genre;

您可以通过以下方式实现:

genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by user_id;

user_data = foreach genre_data {
    genres = foreach genre_data generate movie::genre as genres;
    generate group as user_id, BagToString(genres, '|');
};