使用 Python 解析 CSV
Parsing CSV using Python
我有以下包含三个字段漏洞标题的 csv 文件,
漏洞严重程度,资产IP地址
其中显示了漏洞名称、漏洞级别和具有该漏洞的 IP 地址。
我正在尝试打印一份将列出的报告
专栏中的漏洞
旁边的严重程度
以及具有该漏洞的 IP 地址的最后一列列表。
Vulnerability Title Vulnerability Severity Level Asset IP Address
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.65.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.65.164
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.10.50.82
TLS/SSL Server Supports Weak Cipher Algorithms 6 10.103.65.164
Weak Cryptographic Key 3 10.103.64.10
Unencrypted Telnet Service Available 4 10.10.30.81
Unencrypted Telnet Service Available 4 10.10.50.82
TLS/SSL Server Supports Anonymous Cipher Suites with no Key Authentication 6 10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.65.10
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.65.100
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.10.30.81
并且我想重新创建一个 csv 文件,该文件使用漏洞标题选项卡作为键,并创建第二个名为漏洞严重级别的选项卡,最后一个选项卡将包含具有漏洞的所有 IP 地址
import csv
from pprint import pprint
from collections import defaultdict
import glob
x= glob.glob("/root/*.csv")
d = defaultdict()
n = defaultdict()
for items in x:
with open(items, 'rb') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
a = row["Vulnerability Title"]
b = row["Vulnerability Severity Level"], row["Asset IP Address"]
c = row["Asset IP Address"]
# d = row["Vulnerability Proof"]
d.setdefault(a, []).append(b)
f.close()
pprint(d)
with open('results/ipaddress.csv', 'wb') as csv_file:
writer = csv.writer(csv_file)
for key, value in d.items():
for x,y in value:
n.setdefault(y, []).append(x)
# print x
writer.writerow([key,n])
with open('results/ipaddress2.csv', 'wb') as csv2_file:
writer = csv.writer(csv2_file)
for key, value in d.items():
n.setdefault(value, []).append(key)
writer.writerow([key,n])
因为我不能很好地解释。让我试着简化一下
假设我有以下 csv
Car model owner
Honda Blue James
Toyota Blue Tom
Chevy Green James
Chevy Green Tom
我正在尝试按以下方式创建此 csv:
Car model owner
Honda Blue James
Toyota Blue Tom
Chevy Green James,Tom
两种解决方案都是正确的。
这也是我的最终剧本
import csv
import pandas as pd
df = pd.read_csv('test.csv', names=['Vulnerability Title', 'Vulnerability Severity Level','Asset IP Address'])
#print df
grouped = df.groupby(['Vulnerability Title','Vulnerability Severity Level'])
groups = grouped.groups
#print groups
new_data = [k + (v['Asset IP Address'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['Vulnerability Title' ,'Vulnerability Severity Level', 'Asset IP Address'])
print new_df
new_df.to_csv('final.csv')
谢谢
考虑到您的汽车示例的回答。本质上,我正在创建一个以汽车品牌为键的字典和一个二元组。元组的第一个元素是颜色,第二个元素是所有者列表。):
import csv
car_dict = {}
with open('<file_to_read>', 'rb') as fi:
reader = csv.reader(fi)
for f in reader:
if f[0] in car_dict:
car_dict[f[0]][1].append(f[2])
else:
car_dict[f[0]] = (f[1], [f[2]])
with open('<file_to_write>', 'wb') as ou:
for k in car_dict:
out_string ='{}\t{}\t{}\n'.format(k, car_dict[k][0], ','.join(car_dict[k][1]))
ou.write(out_string)
在操作结构化数据时,尤其是大数据集。我想建议你使用 pandas.
针对您的问题,我将举一个 pandas groupby 功能的示例作为解决方案。假设你有数据:
data = [['vt1', 3, '10.0.0.1'], ['vt1', 3, '10.0.0.2'],
['vt2', 4, '10.0.10.10']]
pandas操作日期很烦人:
import pandas as pd
df = pd.DataFrame(data=data, columns=['title', 'level', 'ip'])
grouped = df.groupby(['title', 'level'])
然后
groups = grouped.groups
几乎是你需要的字典。
print(groups)
{('vt1', 3): [0, 1], ('vt2', 4): [2]}
[0,1]
代表行标签。实际上,您可以迭代这些组以应用您想要的任何操作。例如,如果你想将它们保存到 csv 文件中:
new_data = [k + (v['ip'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['title', 'level', 'ips'])
现在让我们看看new_df是什么:
title level ips
0 vt1 3 [10.0.0.1, 10.0.0.2]
1 vt2 4 [10.0.10.10]
这就是您所需要的。最后,保存到文件:
new_df.to_csv(filename)
我强烈建议您学习 pandas 数据操作。您可能会发现这更容易、更干净。
我有以下包含三个字段漏洞标题的 csv 文件, 漏洞严重程度,资产IP地址 其中显示了漏洞名称、漏洞级别和具有该漏洞的 IP 地址。 我正在尝试打印一份将列出的报告 专栏中的漏洞 旁边的严重程度 以及具有该漏洞的 IP 地址的最后一列列表。
Vulnerability Title Vulnerability Severity Level Asset IP Address
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.65.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.65.164
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566) 4 10.10.50.82
TLS/SSL Server Supports Weak Cipher Algorithms 6 10.103.65.164
Weak Cryptographic Key 3 10.103.64.10
Unencrypted Telnet Service Available 4 10.10.30.81
Unencrypted Telnet Service Available 4 10.10.50.82
TLS/SSL Server Supports Anonymous Cipher Suites with no Key Authentication 6 10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.65.10
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.65.100
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers 3 10.10.30.81
并且我想重新创建一个 csv 文件,该文件使用漏洞标题选项卡作为键,并创建第二个名为漏洞严重级别的选项卡,最后一个选项卡将包含具有漏洞的所有 IP 地址
import csv
from pprint import pprint
from collections import defaultdict
import glob
x= glob.glob("/root/*.csv")
d = defaultdict()
n = defaultdict()
for items in x:
with open(items, 'rb') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
a = row["Vulnerability Title"]
b = row["Vulnerability Severity Level"], row["Asset IP Address"]
c = row["Asset IP Address"]
# d = row["Vulnerability Proof"]
d.setdefault(a, []).append(b)
f.close()
pprint(d)
with open('results/ipaddress.csv', 'wb') as csv_file:
writer = csv.writer(csv_file)
for key, value in d.items():
for x,y in value:
n.setdefault(y, []).append(x)
# print x
writer.writerow([key,n])
with open('results/ipaddress2.csv', 'wb') as csv2_file:
writer = csv.writer(csv2_file)
for key, value in d.items():
n.setdefault(value, []).append(key)
writer.writerow([key,n])
因为我不能很好地解释。让我试着简化一下
假设我有以下 csv
Car model owner
Honda Blue James
Toyota Blue Tom
Chevy Green James
Chevy Green Tom
我正在尝试按以下方式创建此 csv:
Car model owner
Honda Blue James
Toyota Blue Tom
Chevy Green James,Tom
两种解决方案都是正确的。 这也是我的最终剧本
import csv
import pandas as pd
df = pd.read_csv('test.csv', names=['Vulnerability Title', 'Vulnerability Severity Level','Asset IP Address'])
#print df
grouped = df.groupby(['Vulnerability Title','Vulnerability Severity Level'])
groups = grouped.groups
#print groups
new_data = [k + (v['Asset IP Address'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['Vulnerability Title' ,'Vulnerability Severity Level', 'Asset IP Address'])
print new_df
new_df.to_csv('final.csv')
谢谢
考虑到您的汽车示例的回答。本质上,我正在创建一个以汽车品牌为键的字典和一个二元组。元组的第一个元素是颜色,第二个元素是所有者列表。):
import csv
car_dict = {}
with open('<file_to_read>', 'rb') as fi:
reader = csv.reader(fi)
for f in reader:
if f[0] in car_dict:
car_dict[f[0]][1].append(f[2])
else:
car_dict[f[0]] = (f[1], [f[2]])
with open('<file_to_write>', 'wb') as ou:
for k in car_dict:
out_string ='{}\t{}\t{}\n'.format(k, car_dict[k][0], ','.join(car_dict[k][1]))
ou.write(out_string)
在操作结构化数据时,尤其是大数据集。我想建议你使用 pandas.
针对您的问题,我将举一个 pandas groupby 功能的示例作为解决方案。假设你有数据:
data = [['vt1', 3, '10.0.0.1'], ['vt1', 3, '10.0.0.2'],
['vt2', 4, '10.0.10.10']]
pandas操作日期很烦人:
import pandas as pd
df = pd.DataFrame(data=data, columns=['title', 'level', 'ip'])
grouped = df.groupby(['title', 'level'])
然后
groups = grouped.groups
几乎是你需要的字典。
print(groups)
{('vt1', 3): [0, 1], ('vt2', 4): [2]}
[0,1]
代表行标签。实际上,您可以迭代这些组以应用您想要的任何操作。例如,如果你想将它们保存到 csv 文件中:
new_data = [k + (v['ip'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['title', 'level', 'ips'])
现在让我们看看new_df是什么:
title level ips
0 vt1 3 [10.0.0.1, 10.0.0.2]
1 vt2 4 [10.0.10.10]
这就是您所需要的。最后,保存到文件:
new_df.to_csv(filename)
我强烈建议您学习 pandas 数据操作。您可能会发现这更容易、更干净。