猪的纪元时差
Epoch time difference in Pig
我有 3 列,其中包含 start_time
、end_time
和 tags
。时间以纪元时间格式表示,如下例所示。我想找到它们之间有 1 小时时差的行。
示例:
Start_time End_Time Tags
1235000081 1235000501 "Answered"
1235000081 1235000551 "Answered"
如果时差小于一个小时,我需要获取标签列。
我想在 PIG
完成 - 有人可以帮忙吗?
input.txt
1235000081 1235000501 Answered
1235000081 1235000551 Answered
猪脚本
A = Load '/home/kishore/input.txt' as (col1:long, col2:long, col3:chararray);
B = Foreach A generate ToDate(col1) as startdate,ToDate(col2) as enddate,col3;
C = Filter B by GetHour(enddate)-GetHour(startdate) == 1;
Dump C;
您可以根据您的条件筛选行,例如 >,< ,==
如果您想将日期字段保留为时间戳,解决方案如下:
data = LOAD '/path/to/your/input' as (Start_Time:long, End_Time:long, Tags:chararray);
data_proc = FOREACH data GENERATE *, ToDate(Start_Time*1000) as Start_Time,ToDate(End_Time*1000) as End_Time;
output = FILTER data_proc BY GetHour(End_Time)-GetHour(Start_Time) == 1;
Dump @;
一件至关重要的事情是 Pig ToDate UDF 需要精确到毫秒的时间戳,因此在使用此 UDF 之前,您只需将日期字段乘以 1000。
我有 3 列,其中包含 start_time
、end_time
和 tags
。时间以纪元时间格式表示,如下例所示。我想找到它们之间有 1 小时时差的行。
示例:
Start_time End_Time Tags
1235000081 1235000501 "Answered"
1235000081 1235000551 "Answered"
如果时差小于一个小时,我需要获取标签列。
我想在 PIG
完成 - 有人可以帮忙吗?
input.txt
1235000081 1235000501 Answered
1235000081 1235000551 Answered
猪脚本
A = Load '/home/kishore/input.txt' as (col1:long, col2:long, col3:chararray);
B = Foreach A generate ToDate(col1) as startdate,ToDate(col2) as enddate,col3;
C = Filter B by GetHour(enddate)-GetHour(startdate) == 1;
Dump C;
您可以根据您的条件筛选行,例如 >,< ,==
如果您想将日期字段保留为时间戳,解决方案如下:
data = LOAD '/path/to/your/input' as (Start_Time:long, End_Time:long, Tags:chararray);
data_proc = FOREACH data GENERATE *, ToDate(Start_Time*1000) as Start_Time,ToDate(End_Time*1000) as End_Time;
output = FILTER data_proc BY GetHour(End_Time)-GetHour(Start_Time) == 1;
Dump @;
一件至关重要的事情是 Pig ToDate UDF 需要精确到毫秒的时间戳,因此在使用此 UDF 之前,您只需将日期字段乘以 1000。