将具有 multi-value 字段的 CSV 中的 Python 列表转换为 Python 嵌套列表,对嵌套列表值进行排序并导出到 CSV

Turn Python list from CSV with multi-value fields into a Python nested list, sort nested list values and export to CSV

我已经使用 Python csv 模块将带有 multi-value 字段的 csv 转换为 Python list。输出包含具有多个相关值的字段。

['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive']
['ABC', 'ZYG098, AB0134, GF0158', 'A1, B2, C3', 'John Doe, Jane Doe, Abraham Lincoln', '20150301', 'A', 'B']
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C']
['ABC', 'XAZ012, AB0134, YZ089', 'C1, B2, A2 ', 'John Adams, Jane Doe, Thomas Jefferson', '20150302', 'A', 'B']

我想将 Vehicles、Vehicle Class 和 Driver ID 字段转换为嵌套列表,这样如果我将 Vehicle row[1] 中的每个 sub-list 排序为确保车辆始终按字母顺序出现在子列表中,并且 Vehicle Class 和 Driver 保持各自正确的顺序。所以 header 和第一行 sub-list 的排列方式如下:

['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive']
['ABC', 'AB0134, GF0158, ZYG098', 'B2, C3, A1', 'Jane Doe, Abraham Lincoln, John Doe', '20150301', 'A', 'B']
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C']
['ABC', 'AB0134, YZ089, XAZ012', 'B2, A2, C1', 'Jane Doe, Thomas Jefferson, John Adams', '20150302', 'A', 'B']

因此,在上面的输出中,车辆的每个 sub-groups/lists 都按字母顺序排序,车辆 Class 和 Driver_ID 是 re-arranged 以保持其原始关系与他们各自的车辆(即 Driver ID - John Doe 驾驶车辆 - ZYG098 是车辆 Class - A1,所以这些项目在他们的 sub-list 中移动以反映 ZYG098 现在是最后一个,不是第一个)。如果可以做到这一点,您将如何使用原始 headers 将生成的嵌套列表导出回 CSV?

抱歉,如果这很简单或荒谬,我才刚刚开始学习 Python。如果嵌套列表不是最佳选择,我愿意接受任何其他解决方案(对于字典,我需要加入字段来创建键,因为不组合 Route_Date 就没有唯一键)。如果有人有可靠的资源来处理 Python 的各种 CSV 用例,那么建议会很好。

提前感谢您的耐心等待和帮助。

要转换为您描述的嵌套格式:

nested = zip(*lst)

而 zip 是它自己的反函数:

orig = zip(*nested)

但也许你真正想要的是:

import operator

sort = sorted(lst[1:], key=operator.itemgetter(1))

这会为您提供一个按第 1 行排序的新列表。在这种情况下,您没有更改数据的格式,因此您应该能够将其转储为 csv 而无需修改,尽管您需要从 lst[0].

添加原始 headers

最后在同一页面上,虽然需要一些工作,但这会达到你想要的效果:

from itertools import chain
import csv


l = [['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive'],
     ['ABC', 'ZYG098, AB0134, GF0158', 'A1, B2, C3', 'John Doe, Jane Doe, Abraham Lincoln', '20150301', 'A', 'B'],
     ['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C'],
     ['ABC', 'XAZ012, AB0134, YZ089', 'C1, B2, A2 ', 'John Adams, Jane Doe, Thomas Jefferson', '20150302', 'A', 'B']]
it = map(list,zip(*l))

# transpose original list, row-columns, columns-rows
it =  zip(*l)

# get each column separately, using iter so we can pop first element
# off to get headers efficiently 
route, veh, veh_c, d_id, date, start, arrive = iter(iter(next(it))), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it))

# get all headers to write later
headers = next(route), next(veh), next(veh_c), next(d_id), next(date), next(start), next(arrive)

srt_veh = []
key_inds = []

# sort vehicle elements and keep a record of old indexes
# so subelements in Vehicle_class and driver_id can be rearranged to match
for x in veh:
    srt = sorted(x.split(","))
    key_inds.append([x.split(",").index(w) for w in srt])
    srt_veh.append(",".join(srt).strip())

srt_veh_cls = []

# sort vehicle class based on old index of elements in vehicles
# and rejoin split elements
for ind, ele in enumerate(veh_c):
    spl = ele.split(",")
    srt_veh_cls.append(",".join([spl[i].strip() for i in key_inds[ind]]))

srt_dr_id = []

# sort driver_ids  based on old index of elements in vehicle
# and join subelements again after splitting and sorting
for ind, ele in enumerate(d_id):
    spl = ele.split(",")
    srt_dr_id.append(",".join([spl[i].strip() for i in key_inds[ind]]))

 # transpose again for writing
zipped = zip(*(route, srt_veh, srt_veh_cls,
           srt_dr_id, date, start, arrive))

最后写成csv.writerows:

with open("out.csv", "w") as f:
    wr = csv.writer(f)
    wr.writerow(headers)
    wr.writerows(zipped)

输出:

Route,Vehicles,Vehicle Class,Driver_ID,Date,Start,Arrive
ABC,"AB0134, GF0158,ZYG098","B2,C3,A1","Jane Doe,Abraham Lincoln,John Doe",20150301,A,B
AC,ZGA123,C3,George Washington,20150301,A,C
ABC,"AB0134, YZ089,XAZ012","B2,A2,C1","Jane Doe,Thomas Jefferson,John Adams",20150302,A,B

对于 python 2 将 zip 替换为 itertools.izip 并将映射替换为 itertools.imap:

from itertools import izip, imap

您可以压缩更多内容并做一些事情来缩短代码,但我认为这无助于提高可读性。