在 Hortonworks Distribution 中归档 HDFS 文件时出现 AWK 使用问题
AWK usage issue while archiving HDFS files in Hortonworks Distribution
我正在尝试将 HDFS 目录中超过 3 天的文件移动到 HDFS 中的存档文件夹。
AWK 脚本:
hdfs dfs -ls hdfs://companycluster/data/src/purecloud/current | tail -n+2 | xargs -n 8 |
awk '{
DAY_CONV=(60*60*24);
X ="date +%s";X | getline ED;printf("") > "X";close("X");
Y="date -d \"\" +%s";Y | getline SD;printf("") > "Y";close("Y");
DIFF=(ED-SD)/DAY_CONV;
print " SD=",SD" ED=",ED," DIFF=",DIFF," INPUT=",;
if ( DIFF -gt 3)
cmd="hdfs dfs -ls " ;
system(cmd);
}'
注意:一旦此脚本开始工作,cmd 变量将有一个 mv 命令
问题:
- 变量 X 的值是常量
- 变量 Y 的值是常量
- 无法获取 2 个日期之间的日差,我在 DIFF 中获取分数值
- 如果 AWK 中的语句由于参数不准确而失败
AWK 的输入:
-rw-r--r-- 3 user hdfs 50687424 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.240220170000
-rw-r--r-- 3 user hdfs 49967359 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.250220170000
-rw-r--r-- 3 user hdfs 28647041 2017-02-27 17:00 hdfs://companycluster/data/src/purecloud/current/Conversation.json.260220170000
-rw-r--r-- 3 user hdfs 6728724 2017-03-01 13:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1305
-rw-r--r-- 3 user hdfs 7050854 2017-03-01 13:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1325
-rw-r--r-- 3 user hdfs 6630106 2017-03-01 13:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1345
-rw-r--r-- 3 user hdfs 6766650 2017-03-01 14:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1405
-rw-r--r-- 3 user hdfs 6486095 2017-03-01 14:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1425
-rw-r--r-- 3 user hdfs 6350705 2017-03-01 14:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1445
-rw-r--r-- 3 user hdfs 6082589 2017-03-01 15:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1505
-rw-r--r-- 3 user hdfs 6417281 2017-03-01 15:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1525
-rw-r--r-- 3 user hdfs 6519949 2017-03-01 15:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1545
-rw-r--r-- 3 user hdfs 6988534 2017-03-01 16:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1605
-rw-r--r-- 3 user hdfs 6734459 2017-03-01 16:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1625
-rw-r--r-- 3 user hdfs 6842766 2017-03-01 16:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1645
-rw-r--r-- 3 user hdfs 6575513 2017-03-01 17:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1705
-rw-r--r-- 3 user hdfs 6574050 2017-03-01 17:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1725
-rw-r--r-- 3 user hdfs 50215096 2017-02-27 18:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1801
-rw-r--r-- 3 user hdfs 50985760 2017-02-27 18:18 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1818
-rw-r--r-- 3 user hdfs 58206776 2017-02-28 00:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0001
-rw-r--r-- 3 user hdfs 58823497 2017-02-28 06:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0601
-rw-r--r-- 3 user hdfs 61591660 2017-02-28 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1201
-rw-r--r-- 3 user hdfs 59703667 2017-03-01 10:40 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1801
-rw-r--r-- 3 user hdfs 59160075 2017-03-01 10:47 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0001
-rw-r--r-- 3 user hdfs 61812121 2017-03-01 10:48 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0601
-rw-r--r-- 3 user hdfs 63804772 2017-03-01 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_1201
AWK 的输出(有调试打印):
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 50687424 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.240220170000
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 49967359 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.250220170000
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 28647041 2017-02-27 17:00 hdfs://companycluster/data/src/purecloud/current/Conversation.json.260220170000
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6728724 2017-03-01 13:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1305
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 7050854 2017-03-01 13:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1325
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6630106 2017-03-01 13:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1345
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6766650 2017-03-01 14:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1405
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6486095 2017-03-01 14:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1425
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6350705 2017-03-01 14:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1445
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6082589 2017-03-01 15:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1505
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6417281 2017-03-01 15:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1525
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6519949 2017-03-01 15:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1545
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6988534 2017-03-01 16:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1605
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6734459 2017-03-01 16:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1625
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6842766 2017-03-01 16:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1645
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6575513 2017-03-01 17:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1705
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 50215096 2017-02-27 18:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1801
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 50985760 2017-02-27 18:18 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1818
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-28
-rw-r--r-- 3 user hdfs 58206776 2017-02-28 00:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0001
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-28
-rw-r--r-- 3 user hdfs 58823497 2017-02-28 06:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0601
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-28
-rw-r--r-- 3 user hdfs 61591660 2017-02-28 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1201
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 59703667 2017-03-01 10:40 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1801
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 59160075 2017-03-01 10:47 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0001
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 61812121 2017-03-01 10:48 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0601
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 63804772 2017-03-01 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_1201
分布信息:
- 霍顿工厂
- Hadoop 2.7.1.2.4.0.0-169
- Linux dh01 aaaaaaaaaaaaa.x86_64 #1 SMP Sun Jul 27 15:55:46 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
任何输入都会非常有帮助。
hdfs dfs -ls hdfs://companycluster/data/src/purecloud/current | tail -n+2 | xargs -n 8 \
| awk '
BEGIN {
# take the time reference (3 days before now)
R = systime() - 3 * 86400
}
# for each line
{
# format used by mktime "YYYY MM DD HH MM SS [DST]"
# create the time in mktime format
t = " " " 00";gsub( /[-:]/, " ", t)
# convert in epoch
T = mktime( t )
# if lower than reference time
if( T < R ) {
print "Included line: " [=10=]
# do what you want as action
cmd = "hdfs dfs -ls "
system( cmd )
}
else {
print "Discarted line: [=10=]"
}
}'
评论:
- 自己评论了 awk
- awk 的输入当然可以优化(awk do tail 非常好,xargs 在这里肯定不是强制性的[没有要从这里测试的 hdfs])
我正在尝试将 HDFS 目录中超过 3 天的文件移动到 HDFS 中的存档文件夹。
AWK 脚本:
hdfs dfs -ls hdfs://companycluster/data/src/purecloud/current | tail -n+2 | xargs -n 8 |
awk '{
DAY_CONV=(60*60*24);
X ="date +%s";X | getline ED;printf("") > "X";close("X");
Y="date -d \"\" +%s";Y | getline SD;printf("") > "Y";close("Y");
DIFF=(ED-SD)/DAY_CONV;
print " SD=",SD" ED=",ED," DIFF=",DIFF," INPUT=",;
if ( DIFF -gt 3)
cmd="hdfs dfs -ls " ;
system(cmd);
}'
注意:一旦此脚本开始工作,cmd 变量将有一个 mv 命令
问题:
- 变量 X 的值是常量
- 变量 Y 的值是常量
- 无法获取 2 个日期之间的日差,我在 DIFF 中获取分数值
- 如果 AWK 中的语句由于参数不准确而失败
AWK 的输入:
-rw-r--r-- 3 user hdfs 50687424 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.240220170000
-rw-r--r-- 3 user hdfs 49967359 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.250220170000
-rw-r--r-- 3 user hdfs 28647041 2017-02-27 17:00 hdfs://companycluster/data/src/purecloud/current/Conversation.json.260220170000
-rw-r--r-- 3 user hdfs 6728724 2017-03-01 13:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1305
-rw-r--r-- 3 user hdfs 7050854 2017-03-01 13:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1325
-rw-r--r-- 3 user hdfs 6630106 2017-03-01 13:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1345
-rw-r--r-- 3 user hdfs 6766650 2017-03-01 14:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1405
-rw-r--r-- 3 user hdfs 6486095 2017-03-01 14:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1425
-rw-r--r-- 3 user hdfs 6350705 2017-03-01 14:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1445
-rw-r--r-- 3 user hdfs 6082589 2017-03-01 15:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1505
-rw-r--r-- 3 user hdfs 6417281 2017-03-01 15:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1525
-rw-r--r-- 3 user hdfs 6519949 2017-03-01 15:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1545
-rw-r--r-- 3 user hdfs 6988534 2017-03-01 16:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1605
-rw-r--r-- 3 user hdfs 6734459 2017-03-01 16:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1625
-rw-r--r-- 3 user hdfs 6842766 2017-03-01 16:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1645
-rw-r--r-- 3 user hdfs 6575513 2017-03-01 17:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1705
-rw-r--r-- 3 user hdfs 6574050 2017-03-01 17:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1725
-rw-r--r-- 3 user hdfs 50215096 2017-02-27 18:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1801
-rw-r--r-- 3 user hdfs 50985760 2017-02-27 18:18 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1818
-rw-r--r-- 3 user hdfs 58206776 2017-02-28 00:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0001
-rw-r--r-- 3 user hdfs 58823497 2017-02-28 06:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0601
-rw-r--r-- 3 user hdfs 61591660 2017-02-28 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1201
-rw-r--r-- 3 user hdfs 59703667 2017-03-01 10:40 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1801
-rw-r--r-- 3 user hdfs 59160075 2017-03-01 10:47 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0001
-rw-r--r-- 3 user hdfs 61812121 2017-03-01 10:48 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0601
-rw-r--r-- 3 user hdfs 63804772 2017-03-01 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_1201
AWK 的输出(有调试打印):
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 50687424 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.240220170000
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 49967359 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.250220170000
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 28647041 2017-02-27 17:00 hdfs://companycluster/data/src/purecloud/current/Conversation.json.260220170000
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6728724 2017-03-01 13:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1305
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 7050854 2017-03-01 13:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1325
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6630106 2017-03-01 13:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1345
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6766650 2017-03-01 14:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1405
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6486095 2017-03-01 14:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1425
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6350705 2017-03-01 14:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1445
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6082589 2017-03-01 15:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1505
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6417281 2017-03-01 15:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1525
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6519949 2017-03-01 15:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1545
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6988534 2017-03-01 16:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1605
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6734459 2017-03-01 16:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1625
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6842766 2017-03-01 16:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1645
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 6575513 2017-03-01 17:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1705
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 50215096 2017-02-27 18:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1801
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27
-rw-r--r-- 3 user hdfs 50985760 2017-02-27 18:18 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1818
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-28
-rw-r--r-- 3 user hdfs 58206776 2017-02-28 00:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0001
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-28
-rw-r--r-- 3 user hdfs 58823497 2017-02-28 06:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0601
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-28
-rw-r--r-- 3 user hdfs 61591660 2017-02-28 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1201
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 59703667 2017-03-01 10:40 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1801
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 59160075 2017-03-01 10:47 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0001
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 61812121 2017-03-01 10:48 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0601
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01
-rw-r--r-- 3 user hdfs 63804772 2017-03-01 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_1201
分布信息:
- 霍顿工厂
- Hadoop 2.7.1.2.4.0.0-169
- Linux dh01 aaaaaaaaaaaaa.x86_64 #1 SMP Sun Jul 27 15:55:46 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
任何输入都会非常有帮助。
hdfs dfs -ls hdfs://companycluster/data/src/purecloud/current | tail -n+2 | xargs -n 8 \
| awk '
BEGIN {
# take the time reference (3 days before now)
R = systime() - 3 * 86400
}
# for each line
{
# format used by mktime "YYYY MM DD HH MM SS [DST]"
# create the time in mktime format
t = " " " 00";gsub( /[-:]/, " ", t)
# convert in epoch
T = mktime( t )
# if lower than reference time
if( T < R ) {
print "Included line: " [=10=]
# do what you want as action
cmd = "hdfs dfs -ls "
system( cmd )
}
else {
print "Discarted line: [=10=]"
}
}'
评论:
- 自己评论了 awk
- awk 的输入当然可以优化(awk do tail 非常好,xargs 在这里肯定不是强制性的[没有要从这里测试的 hdfs])