PIG:使用条件
PIG: using conditions
我有以下电影数据库数据集:
评分:用户 ID、电影 ID、评分 :: 电影:电影 ID、标题 :: 用户:用户 ID、性别、年龄
现在我加入了评分和用户。目标是根据性别 F 和 M 一起确定每个 movieID 评级。还包括至少有 20 个 F 和 M 评级的电影。
data = JOIN myuser BY user, myrating BY user;
grouped_users = GROUP data BY (movie,gender);
现在在 grouped_users 之后,我需要过滤掉男女评分都低于 20 的电影。我该怎么做?
grouped_users_twenty = FILTER grouped_users BY SIZE(grouped_users)>=20;
这是我的逻辑。获取错误。
data = JOIN myuser BY user, myrating BY user;
grouped_users = foreach (GROUP data BY (movie,gender)) {
generate
group.movie,
group.gender,
SIZE(data) as user_size
;
};
grouped_users_twenty = FILTER grouped_users BY user_size>=20;
grouped_users_twenty = FOREACH grouped_users GENERATE group,COUNT(rating) as rating_count;
final = FILTER grouped_users_twenty BY rating_count >= 20;
我有以下电影数据库数据集:
评分:用户 ID、电影 ID、评分 :: 电影:电影 ID、标题 :: 用户:用户 ID、性别、年龄
现在我加入了评分和用户。目标是根据性别 F 和 M 一起确定每个 movieID 评级。还包括至少有 20 个 F 和 M 评级的电影。
data = JOIN myuser BY user, myrating BY user;
grouped_users = GROUP data BY (movie,gender);
现在在 grouped_users 之后,我需要过滤掉男女评分都低于 20 的电影。我该怎么做?
grouped_users_twenty = FILTER grouped_users BY SIZE(grouped_users)>=20;
这是我的逻辑。获取错误。
data = JOIN myuser BY user, myrating BY user;
grouped_users = foreach (GROUP data BY (movie,gender)) {
generate
group.movie,
group.gender,
SIZE(data) as user_size
;
};
grouped_users_twenty = FILTER grouped_users BY user_size>=20;
grouped_users_twenty = FOREACH grouped_users GENERATE group,COUNT(rating) as rating_count;
final = FILTER grouped_users_twenty BY rating_count >= 20;