hadoop 中的数据包计数(使用 Mapreduce)

Packet count in hadoop ( with Mapreduce )

事情已经完成:


Hadoop 安装来自以下 link:

http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_4_4.html


安装 Hping3 以使用以下方式生成泛洪请求:

sudo hping3 -c 10000 -d 120 -S -w 64 -p 8000 --flood --rand-source 192.168.1.12

已安装 snort 以记录上述使用的请求:

sudo snort -ved -h 192.168.1.0/24 -l .

这会生成日志文件 snort.log.1427021231

我可以用

阅读它
sudo snort -r snort.log.1427021231

给出表格的输出:

=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+= +=+=+=+=+=+=+=+=+=+=+=+=+=+=+

03/22-16:17:14.259633 192.168.1.12:8000 -> 117.247.194.105:46639 TCPTTL:64TOS:0x0ID:0IpLen:20DgmLen:44DF AS 序列:0x6EEE4A6B 确认:0x6DF6015B 赢:0x7210 TcpLen:24 TCP 选项 (1) => MSS:1460 =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ =+=+=+=+=+=+=+=+=+=+=+=+


我用过

hdfs dfs -put <localsrc> ... <dst>

将此日志文件复制到 HDFS。

现在,Thnigs 我需要帮助:

如何统计日志文件中源IP地址、目标IP地址、端口地址、协议、时间戳的总数。

(我必须编写自己的 Map reduce 程序吗?或者有一个库。)


我也找到了

https://github.com/ssallys/p3

但是没能做到运行。查看了 JAR 文件的内容,但无法 运行 它。

ratan@lenovo:~/Desktop$ hadoop jar ./p3lite.jar p3.pcap.examples.PacketCount

Exception in thread "main" java.lang.ClassNotFoundException:        nflow.runner.Runner
at java.net.URLClassLoader.run(URLClassLoader.java:366)
at java.net.URLClassLoader.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.main(RunJar.java:201)

谢谢。

快速搜索后,这似乎是您可能需要自定义 MapReduce 作业的内容。

该算法类似于以下伪代码:

Parse the file line by line (or parse every n lines if logs are more than one line long).

in the mapper, use regex to figure out if something is a source IP, destination IP etc.

output these with key value structure of <Type, count> 
    type is the type of text that was matched (ex. source IP)
    count is the number of times it was matched in the record

have reducer sum all of the values from the mappers, and get global totals for each type of information you want

write to file in desired format.