Python 将来自 CSV 的二维列表中的值分组
Python group values in 2d list from CSV
我有以下 CSV
BBCP1,Grey,2140,805EC0FFFFE2,0000000066
BBCP1,Test,2150,805EC0FFFFE2,0000000066
BBCP1,Test,2151,805EC0FFFFE1,0000000066
BBCP1,Centre,2141,805EC0FFFFE3,000000077
BBCP1,Yellow,2142,805EC0FFFFE3,000000077
BBCP1,Purple,2143,805EC0FFFFE3,000000077
BBCP1,Green,2144,805EC0FFFFE3,000000077
BBCP1,Pink,2145,805EC0FFFFE3,000000077
我正在使用
读取这些数据
data = list(csv.reader(open(csvFile)))
我想将此数据转换为二维数组或等效数组,并按第 4 列中的值分组(MAC 地址),保留它们在原始列表中的顺序。所以它看起来像
[(BBCP1,Grey,2140,805EC0FFFFE2,0000000066),(BBCP1,Test,2150,805EC0FFFFE2,0000000066)],
[(BBCP1,Test,2151,805EC0FFFFE1,0000000066)],
[(BBCP1,Centre,2141,805EC0FFFFE3,000000077),
(BBCP1,Yellow,2142,805EC0FFFFE3,000000077),
(BBCP1,Purple,2143,805EC0FFFFE3,000000077),
(BBCP1,Green,2144,805EC0FFFFE3,000000077),
(BBCP1,Pink,2145,805EC0FFFFE3,000000077)]
希望我已经正确显示数组并且它有意义。
然后我需要循环数组以将数据输出到文件。我很确定我可以使用嵌套的 for 循环。
在此先感谢您的帮助
使用 defaultdict
对数据进行分组(groupby
需要排序并且效率低下/会破坏顺序),然后打印排序后的字典值(排序并不是真正必要的,它是只是为了稳定输出):
import csv,collections
d = collections.defaultdict(list)
for row in csv.reader(txt):
mac_address = row[3]
d[mac_address].append(row)
print(sorted(d.values()))
导致:
[[['BBCP1', 'Centre', '2141', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Yellow', '2142', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Purple', '2143', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Green', '2144', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Pink', '2145', '805EC0FFFFE3', '000000077']],
[['BBCP1', 'Grey', '2140', '805EC0FFFFE2', '0000000066'],
['BBCP1', 'Test', '2150', '805EC0FFFFE2', '0000000066']],
[['BBCP1', 'Test', '2151', '805EC0FFFFE1', '0000000066']]]
根据键排序(mac 地址):
values = [v for _,v in sorted(d.items())]
产量:
[[['BBCP1', 'Test', '2151', '805EC0FFFFE1', '0000000066']],
[['BBCP1', 'Grey', '2140', '805EC0FFFFE2', '0000000066'],
['BBCP1', 'Test', '2150', '805EC0FFFFE2', '0000000066']],
[['BBCP1', 'Centre', '2141', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Yellow', '2142', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Purple', '2143', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Green', '2144', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Pink', '2145', '805EC0FFFFE3', '000000077']]]
您好,我使用 pandas
和 groupby
来解决问题。希望这对您有所帮助!!
data = pd.read_csv('data.txt', header=None)
data.columns = ['A','B','C','D','E'] # random names to the column
def check(data):
data_item = []
for index,item in data.iterrows():
data_item.append(item.tolist()))
return data_item
grouped_data = data.groupby('D',sort=False).apply(check)
for data in grouped_data:
print(data)
输出#preserving顺序
[['BBCP1', 'Grey', 2140, '805EC0FFFFE2', 66], ['BBCP1', 'Test', 2150, '805EC0FFFFE2', 66]]
[['BBCP1', 'Test', 2151, '805EC0FFFFE1', 66]]
[['BBCP1', 'Centre', 2141, '805EC0FFFFE3', 77], ['BBCP1', 'Yellow', 2142, '805EC0FFFFE3', 77], ['BBCP1', 'Purple', 2143, '805EC0FFFFE3', 77], ['BBCP1', 'Green', 2144, '805EC0FFFFE3', 77], ['BBCP1', 'Pink', 2145, '805EC0FFFFE3', 77]]
我有以下 CSV
BBCP1,Grey,2140,805EC0FFFFE2,0000000066
BBCP1,Test,2150,805EC0FFFFE2,0000000066
BBCP1,Test,2151,805EC0FFFFE1,0000000066
BBCP1,Centre,2141,805EC0FFFFE3,000000077
BBCP1,Yellow,2142,805EC0FFFFE3,000000077
BBCP1,Purple,2143,805EC0FFFFE3,000000077
BBCP1,Green,2144,805EC0FFFFE3,000000077
BBCP1,Pink,2145,805EC0FFFFE3,000000077
我正在使用
读取这些数据data = list(csv.reader(open(csvFile)))
我想将此数据转换为二维数组或等效数组,并按第 4 列中的值分组(MAC 地址),保留它们在原始列表中的顺序。所以它看起来像
[(BBCP1,Grey,2140,805EC0FFFFE2,0000000066),(BBCP1,Test,2150,805EC0FFFFE2,0000000066)],
[(BBCP1,Test,2151,805EC0FFFFE1,0000000066)],
[(BBCP1,Centre,2141,805EC0FFFFE3,000000077),
(BBCP1,Yellow,2142,805EC0FFFFE3,000000077),
(BBCP1,Purple,2143,805EC0FFFFE3,000000077),
(BBCP1,Green,2144,805EC0FFFFE3,000000077),
(BBCP1,Pink,2145,805EC0FFFFE3,000000077)]
希望我已经正确显示数组并且它有意义。
然后我需要循环数组以将数据输出到文件。我很确定我可以使用嵌套的 for 循环。
在此先感谢您的帮助
使用 defaultdict
对数据进行分组(groupby
需要排序并且效率低下/会破坏顺序),然后打印排序后的字典值(排序并不是真正必要的,它是只是为了稳定输出):
import csv,collections
d = collections.defaultdict(list)
for row in csv.reader(txt):
mac_address = row[3]
d[mac_address].append(row)
print(sorted(d.values()))
导致:
[[['BBCP1', 'Centre', '2141', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Yellow', '2142', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Purple', '2143', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Green', '2144', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Pink', '2145', '805EC0FFFFE3', '000000077']],
[['BBCP1', 'Grey', '2140', '805EC0FFFFE2', '0000000066'],
['BBCP1', 'Test', '2150', '805EC0FFFFE2', '0000000066']],
[['BBCP1', 'Test', '2151', '805EC0FFFFE1', '0000000066']]]
根据键排序(mac 地址):
values = [v for _,v in sorted(d.items())]
产量:
[[['BBCP1', 'Test', '2151', '805EC0FFFFE1', '0000000066']],
[['BBCP1', 'Grey', '2140', '805EC0FFFFE2', '0000000066'],
['BBCP1', 'Test', '2150', '805EC0FFFFE2', '0000000066']],
[['BBCP1', 'Centre', '2141', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Yellow', '2142', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Purple', '2143', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Green', '2144', '805EC0FFFFE3', '000000077'],
['BBCP1', 'Pink', '2145', '805EC0FFFFE3', '000000077']]]
您好,我使用 pandas
和 groupby
来解决问题。希望这对您有所帮助!!
data = pd.read_csv('data.txt', header=None)
data.columns = ['A','B','C','D','E'] # random names to the column
def check(data):
data_item = []
for index,item in data.iterrows():
data_item.append(item.tolist()))
return data_item
grouped_data = data.groupby('D',sort=False).apply(check)
for data in grouped_data:
print(data)
输出#preserving顺序
[['BBCP1', 'Grey', 2140, '805EC0FFFFE2', 66], ['BBCP1', 'Test', 2150, '805EC0FFFFE2', 66]]
[['BBCP1', 'Test', 2151, '805EC0FFFFE1', 66]]
[['BBCP1', 'Centre', 2141, '805EC0FFFFE3', 77], ['BBCP1', 'Yellow', 2142, '805EC0FFFFE3', 77], ['BBCP1', 'Purple', 2143, '805EC0FFFFE3', 77], ['BBCP1', 'Green', 2144, '805EC0FFFFE3', 77], ['BBCP1', 'Pink', 2145, '805EC0FFFFE3', 77]]