使用 Python 解析 CSV

Question

我有以下包含三个字段漏洞标题的 csv 文件，漏洞严重程度，资产IP地址其中显示了漏洞名称、漏洞级别和具有该漏洞的 IP 地址。我正在尝试打印一份将列出的报告专栏中的漏洞旁边的严重程度以及具有该漏洞的 IP 地址的最后一列列表。

Vulnerability Title Vulnerability Severity Level    Asset IP Address
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.65.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.65.164
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.50.82
TLS/SSL Server Supports Weak Cipher Algorithms  6   10.103.65.164
Weak Cryptographic Key  3   10.103.64.10
Unencrypted Telnet Service Available    4   10.10.30.81
Unencrypted Telnet Service Available    4   10.10.50.82
TLS/SSL Server Supports Anonymous Cipher Suites with no Key Authentication  6   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.100
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.10.30.81

并且我想重新创建一个 csv 文件，该文件使用漏洞标题选项卡作为键，并创建第二个名为漏洞严重级别的选项卡，最后一个选项卡将包含具有漏洞的所有 IP 地址

import csv
from pprint import pprint
from collections import defaultdict
import glob
x= glob.glob("/root/*.csv")

d = defaultdict()
n = defaultdict()
for items in x:
        with open(items, 'rb') as f:
                reader = csv.DictReader(f, delimiter=',')
                for row in reader:
                        a = row["Vulnerability Title"]
                        b = row["Vulnerability Severity Level"], row["Asset IP Address"]
                        c = row["Asset IP Address"]
        #               d = row["Vulnerability Proof"]
                        d.setdefault(a, []).append(b)
        f.close()
pprint(d)
with open('results/ipaddress.csv', 'wb') as csv_file:
        writer = csv.writer(csv_file)
        for key, value in d.items():
                for x,y in value:
                        n.setdefault(y, []).append(x)
#                       print x
                        writer.writerow([key,n])

with open('results/ipaddress2.csv', 'wb') as csv2_file:
        writer = csv.writer(csv2_file)
        for key, value in d.items():
             n.setdefault(value, []).append(key)
             writer.writerow([key,n])

因为我不能很好地解释。让我试着简化一下

假设我有以下 csv

Car model   owner
Honda   Blue    James
Toyota  Blue    Tom
Chevy   Green   James
Chevy   Green   Tom

我正在尝试按以下方式创建此 csv：

Car model   owner
Honda   Blue    James
Toyota  Blue    Tom
Chevy   Green   James,Tom

两种解决方案都是正确的。这也是我的最终剧本

import csv
import pandas as pd

df = pd.read_csv('test.csv', names=['Vulnerability Title', 'Vulnerability Severity Level','Asset IP Address'])
#print df
grouped = df.groupby(['Vulnerability Title','Vulnerability Severity Level'])

groups = grouped.groups
#print groups
new_data = [k + (v['Asset IP Address'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['Vulnerability Title' ,'Vulnerability Severity Level', 'Asset IP Address'])

print new_df
new_df.to_csv('final.csv')

谢谢

Answer 1

考虑到您的汽车示例的回答。本质上，我正在创建一个以汽车品牌为键的字典和一个二元组。元组的第一个元素是颜色，第二个元素是所有者列表。):

import csv

car_dict = {}
with open('<file_to_read>', 'rb') as fi:
    reader = csv.reader(fi)
    for f in reader:
        if f[0] in car_dict:
            car_dict[f[0]][1].append(f[2]) 
        else:
            car_dict[f[0]] = (f[1], [f[2]])

with open('<file_to_write>', 'wb') as ou:
    for k in car_dict:
        out_string ='{}\t{}\t{}\n'.format(k, car_dict[k][0], ','.join(car_dict[k][1]))
        ou.write(out_string)

Answer 2

在操作结构化数据时，尤其是大数据集。我想建议你使用 pandas.

针对您的问题，我将举一个 pandas groupby 功能的示例作为解决方案。假设你有数据：

data = [['vt1', 3, '10.0.0.1'], ['vt1', 3, '10.0.0.2'], 
        ['vt2', 4, '10.0.10.10']]

pandas操作日期很烦人：

import pandas as pd

df = pd.DataFrame(data=data, columns=['title', 'level', 'ip'])
grouped = df.groupby(['title', 'level'])

然后

groups = grouped.groups

几乎是你需要的字典。

print(groups)
{('vt1', 3): [0, 1], ('vt2', 4): [2]}

[0,1]代表行标签。实际上，您可以迭代这些组以应用您想要的任何操作。例如，如果你想将它们保存到 csv 文件中：

new_data = [k + (v['ip'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['title', 'level', 'ips'])

现在让我们看看new_df是什么：

  title  level                   ips
0   vt1      3  [10.0.0.1, 10.0.0.2]
1   vt2      4          [10.0.10.10]

这就是您所需要的。最后，保存到文件：

new_df.to_csv(filename)

我强烈建议您学习 pandas 数据操作。您可能会发现这更容易、更干净。

使用 Python 解析 CSV

Parsing CSV using Python

python

csv

dictionary

setdefault