将具有 multi-value 字段的 CSV 中的 Python 列表转换为 Python 嵌套列表,对嵌套列表值进行排序并导出到 CSV
Turn Python list from CSV with multi-value fields into a Python nested list, sort nested list values and export to CSV
我已经使用 Python csv
模块将带有 multi-value 字段的 csv
转换为 Python list
。输出包含具有多个相关值的字段。
['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive']
['ABC', 'ZYG098, AB0134, GF0158', 'A1, B2, C3', 'John Doe, Jane Doe, Abraham Lincoln', '20150301', 'A', 'B']
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C']
['ABC', 'XAZ012, AB0134, YZ089', 'C1, B2, A2 ', 'John Adams, Jane Doe, Thomas Jefferson', '20150302', 'A', 'B']
我想将 Vehicles、Vehicle Class 和 Driver ID 字段转换为嵌套列表,这样如果我将 Vehicle row[1]
中的每个 sub-list 排序为确保车辆始终按字母顺序出现在子列表中,并且 Vehicle Class 和 Driver 保持各自正确的顺序。所以 header 和第一行 sub-list 的排列方式如下:
['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive']
['ABC', 'AB0134, GF0158, ZYG098', 'B2, C3, A1', 'Jane Doe, Abraham Lincoln, John Doe', '20150301', 'A', 'B']
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C']
['ABC', 'AB0134, YZ089, XAZ012', 'B2, A2, C1', 'Jane Doe, Thomas Jefferson, John Adams', '20150302', 'A', 'B']
因此,在上面的输出中,车辆的每个 sub-groups/lists 都按字母顺序排序,车辆 Class 和 Driver_ID 是 re-arranged 以保持其原始关系与他们各自的车辆(即 Driver ID - John Doe 驾驶车辆 - ZYG098 是车辆 Class - A1,所以这些项目在他们的 sub-list 中移动以反映 ZYG098 现在是最后一个,不是第一个)。如果可以做到这一点,您将如何使用原始 headers 将生成的嵌套列表导出回 CSV?
抱歉,如果这很简单或荒谬,我才刚刚开始学习 Python。如果嵌套列表不是最佳选择,我愿意接受任何其他解决方案(对于字典,我需要加入字段来创建键,因为不组合 Route_Date 就没有唯一键)。如果有人有可靠的资源来处理 Python 的各种 CSV 用例,那么建议会很好。
提前感谢您的耐心等待和帮助。
要转换为您描述的嵌套格式:
nested = zip(*lst)
而 zip 是它自己的反函数:
orig = zip(*nested)
但也许你真正想要的是:
import operator
sort = sorted(lst[1:], key=operator.itemgetter(1))
这会为您提供一个按第 1 行排序的新列表。在这种情况下,您没有更改数据的格式,因此您应该能够将其转储为 csv 而无需修改,尽管您需要从 lst[0].
添加原始 headers
最后在同一页面上,虽然需要一些工作,但这会达到你想要的效果:
from itertools import chain
import csv
l = [['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive'],
['ABC', 'ZYG098, AB0134, GF0158', 'A1, B2, C3', 'John Doe, Jane Doe, Abraham Lincoln', '20150301', 'A', 'B'],
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C'],
['ABC', 'XAZ012, AB0134, YZ089', 'C1, B2, A2 ', 'John Adams, Jane Doe, Thomas Jefferson', '20150302', 'A', 'B']]
it = map(list,zip(*l))
# transpose original list, row-columns, columns-rows
it = zip(*l)
# get each column separately, using iter so we can pop first element
# off to get headers efficiently
route, veh, veh_c, d_id, date, start, arrive = iter(iter(next(it))), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it))
# get all headers to write later
headers = next(route), next(veh), next(veh_c), next(d_id), next(date), next(start), next(arrive)
srt_veh = []
key_inds = []
# sort vehicle elements and keep a record of old indexes
# so subelements in Vehicle_class and driver_id can be rearranged to match
for x in veh:
srt = sorted(x.split(","))
key_inds.append([x.split(",").index(w) for w in srt])
srt_veh.append(",".join(srt).strip())
srt_veh_cls = []
# sort vehicle class based on old index of elements in vehicles
# and rejoin split elements
for ind, ele in enumerate(veh_c):
spl = ele.split(",")
srt_veh_cls.append(",".join([spl[i].strip() for i in key_inds[ind]]))
srt_dr_id = []
# sort driver_ids based on old index of elements in vehicle
# and join subelements again after splitting and sorting
for ind, ele in enumerate(d_id):
spl = ele.split(",")
srt_dr_id.append(",".join([spl[i].strip() for i in key_inds[ind]]))
# transpose again for writing
zipped = zip(*(route, srt_veh, srt_veh_cls,
srt_dr_id, date, start, arrive))
最后写成csv.writerows:
with open("out.csv", "w") as f:
wr = csv.writer(f)
wr.writerow(headers)
wr.writerows(zipped)
输出:
Route,Vehicles,Vehicle Class,Driver_ID,Date,Start,Arrive
ABC,"AB0134, GF0158,ZYG098","B2,C3,A1","Jane Doe,Abraham Lincoln,John Doe",20150301,A,B
AC,ZGA123,C3,George Washington,20150301,A,C
ABC,"AB0134, YZ089,XAZ012","B2,A2,C1","Jane Doe,Thomas Jefferson,John Adams",20150302,A,B
对于 python 2 将 zip 替换为 itertools.izip
并将映射替换为 itertools.imap
:
from itertools import izip, imap
您可以压缩更多内容并做一些事情来缩短代码,但我认为这无助于提高可读性。
我已经使用 Python csv
模块将带有 multi-value 字段的 csv
转换为 Python list
。输出包含具有多个相关值的字段。
['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive']
['ABC', 'ZYG098, AB0134, GF0158', 'A1, B2, C3', 'John Doe, Jane Doe, Abraham Lincoln', '20150301', 'A', 'B']
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C']
['ABC', 'XAZ012, AB0134, YZ089', 'C1, B2, A2 ', 'John Adams, Jane Doe, Thomas Jefferson', '20150302', 'A', 'B']
我想将 Vehicles、Vehicle Class 和 Driver ID 字段转换为嵌套列表,这样如果我将 Vehicle row[1]
中的每个 sub-list 排序为确保车辆始终按字母顺序出现在子列表中,并且 Vehicle Class 和 Driver 保持各自正确的顺序。所以 header 和第一行 sub-list 的排列方式如下:
['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive']
['ABC', 'AB0134, GF0158, ZYG098', 'B2, C3, A1', 'Jane Doe, Abraham Lincoln, John Doe', '20150301', 'A', 'B']
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C']
['ABC', 'AB0134, YZ089, XAZ012', 'B2, A2, C1', 'Jane Doe, Thomas Jefferson, John Adams', '20150302', 'A', 'B']
因此,在上面的输出中,车辆的每个 sub-groups/lists 都按字母顺序排序,车辆 Class 和 Driver_ID 是 re-arranged 以保持其原始关系与他们各自的车辆(即 Driver ID - John Doe 驾驶车辆 - ZYG098 是车辆 Class - A1,所以这些项目在他们的 sub-list 中移动以反映 ZYG098 现在是最后一个,不是第一个)。如果可以做到这一点,您将如何使用原始 headers 将生成的嵌套列表导出回 CSV?
抱歉,如果这很简单或荒谬,我才刚刚开始学习 Python。如果嵌套列表不是最佳选择,我愿意接受任何其他解决方案(对于字典,我需要加入字段来创建键,因为不组合 Route_Date 就没有唯一键)。如果有人有可靠的资源来处理 Python 的各种 CSV 用例,那么建议会很好。
提前感谢您的耐心等待和帮助。
要转换为您描述的嵌套格式:
nested = zip(*lst)
而 zip 是它自己的反函数:
orig = zip(*nested)
但也许你真正想要的是:
import operator
sort = sorted(lst[1:], key=operator.itemgetter(1))
这会为您提供一个按第 1 行排序的新列表。在这种情况下,您没有更改数据的格式,因此您应该能够将其转储为 csv 而无需修改,尽管您需要从 lst[0].
添加原始 headers最后在同一页面上,虽然需要一些工作,但这会达到你想要的效果:
from itertools import chain
import csv
l = [['Route', 'Vehicles', 'Vehicle Class', 'Driver_ID', 'Date', 'Start', 'Arrive'],
['ABC', 'ZYG098, AB0134, GF0158', 'A1, B2, C3', 'John Doe, Jane Doe, Abraham Lincoln', '20150301', 'A', 'B'],
['AC', 'ZGA123', 'C3', 'George Washington', '20150301', 'A', 'C'],
['ABC', 'XAZ012, AB0134, YZ089', 'C1, B2, A2 ', 'John Adams, Jane Doe, Thomas Jefferson', '20150302', 'A', 'B']]
it = map(list,zip(*l))
# transpose original list, row-columns, columns-rows
it = zip(*l)
# get each column separately, using iter so we can pop first element
# off to get headers efficiently
route, veh, veh_c, d_id, date, start, arrive = iter(iter(next(it))), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it)), iter(next(it))
# get all headers to write later
headers = next(route), next(veh), next(veh_c), next(d_id), next(date), next(start), next(arrive)
srt_veh = []
key_inds = []
# sort vehicle elements and keep a record of old indexes
# so subelements in Vehicle_class and driver_id can be rearranged to match
for x in veh:
srt = sorted(x.split(","))
key_inds.append([x.split(",").index(w) for w in srt])
srt_veh.append(",".join(srt).strip())
srt_veh_cls = []
# sort vehicle class based on old index of elements in vehicles
# and rejoin split elements
for ind, ele in enumerate(veh_c):
spl = ele.split(",")
srt_veh_cls.append(",".join([spl[i].strip() for i in key_inds[ind]]))
srt_dr_id = []
# sort driver_ids based on old index of elements in vehicle
# and join subelements again after splitting and sorting
for ind, ele in enumerate(d_id):
spl = ele.split(",")
srt_dr_id.append(",".join([spl[i].strip() for i in key_inds[ind]]))
# transpose again for writing
zipped = zip(*(route, srt_veh, srt_veh_cls,
srt_dr_id, date, start, arrive))
最后写成csv.writerows:
with open("out.csv", "w") as f:
wr = csv.writer(f)
wr.writerow(headers)
wr.writerows(zipped)
输出:
Route,Vehicles,Vehicle Class,Driver_ID,Date,Start,Arrive
ABC,"AB0134, GF0158,ZYG098","B2,C3,A1","Jane Doe,Abraham Lincoln,John Doe",20150301,A,B
AC,ZGA123,C3,George Washington,20150301,A,C
ABC,"AB0134, YZ089,XAZ012","B2,A2,C1","Jane Doe,Thomas Jefferson,John Adams",20150302,A,B
对于 python 2 将 zip 替换为 itertools.izip
并将映射替换为 itertools.imap
:
from itertools import izip, imap
您可以压缩更多内容并做一些事情来缩短代码,但我认为这无助于提高可读性。