转置 python 中具有不同行长度的 csv

Transpose csv in python with different row lengths

我有许多行长度可变的 csv 文件。例如以下:

Time,0,8,18,46,132,163,224,238,267,303
X,0,14,14,14,15,16,17,15,15,15
Time,0,4,13,22,32,41,50,59,69,78,87,97,106,115,125,127,137,146,155,165,174,183,192,202,211,220,230,239,248,258,267,277,289,298,308
Y,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Time,0,4,13,22,32,41,50,59,69,78,87,97,106,115,125,127,137,146,155,165,174,183,192,202,211,220,230,239,248,258,267,277,289,298,308
Z,0,1,2,1,1,1,1,1,1,2,2,1,0,1,1,2,2,2,2,2,1,1,2,2,2,1,1,1,1,1,2,2,2,2,2
Time,0,308
W,0,0

变为:

Time,X,Time,Y,Time,Z,Time,W
0,0,0,0,0,0,0,0
8,14,4,0,4,1,308,0

丢失了很多数据,每个只取了前2个。

我想在 python 中转置此 CSV。我有以下程序:

import csv
import os
from itertools import izip
import sys

try:
    filename = sys.argv[1]
except IndexError:
    print 'Please add a filename'
    exit(-1)
with open(os.path.splitext(filename)[0] + '_t.csv', 'wb') as outfile, open(filename, 'rb') as infile:
    a = izip(*csv.reader(infile))
    csv.writer(outfile).writerows(a)

不过似乎trim很多数据,因为文件从 20KB 减少到 6KB,并且只保持最小行长度。

关于如何不丢失任何数据有什么想法吗?

这是一种没有itertools.izip的方法:

import csv

with open('transpose.csv') as infile, \
        open('out.csv', 'w') as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    while True:
        try:
            index = next(reader)
            data = next(reader)
        except StopIteration:
            break
        writer.writerows(zip(index, data))

根据您给定的输入,此代码段会生成以下内容 out.csv

Time,X
568,0
573,0
577,1
581,1
585,0
590,2
594,0
599,0
603,0
Time,Y
590,0
594,3
599,3
03,0
Time,Z
599,0
603,1

这是你想要的吗?


更新

此修改后的示例应与您更新后的问题相匹配:

import csv
from itertools import zip_longest  # izip_longest in Python 2

with open('transpose.csv') as infile, \
        open('out.csv', 'w') as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)

    writer.writerows(zip_longest(*reader, fillvalue=0))

fillvalue 更新为您希望用其替换缺失值的内容。

izip 根据最短数组压缩,因此您只能从每一行中获取最短数组长度的值。

你应该使用 izip_longest 而不是那个,它用最长的数组压缩,它会把 None 放在没有值的地方。

例子-

import csv
import os
from itertools import izip_longest
import sys

try:
    filename = sys.argv[1]
except IndexError:
    print 'Please add a filename'
    exit(-1)
with open(os.path.splitext(filename)[0] + '_t.csv', 'wb') as outfile, open(filename, 'rb') as infile:
    a = izip_longest(*csv.reader(infile))
    csv.writer(outfile).writerows(a)

我从中得到的结果 -

Time,X,Time,Y,Time,Z,Time,W

0,0,0,0,0,0,0,0

8,14,4,0,4,1,308,0

18,14,13,1,13,2,,

46,14,22,1,22,1,,

132,15,32,1,32,1,,

163,16,41,1,41,1,,

224,17,50,1,50,1,,

238,15,59,1,59,1,,

267,15,69,1,69,1,,

303,15,78,1,78,2,,

,,87,1,87,2,,

,,97,1,97,1,,

,,106,1,106,0,,

,,115,1,115,1,,

,,125,1,125,1,,

,,127,1,127,2,,

,,137,1,137,2,,

,,146,1,146,2,,

,,155,1,155,2,,

,,165,1,165,2,,

,,174,1,174,1,,

,,183,1,183,1,,

,,192,1,192,2,,

,,202,1,202,2,,

,,211,1,211,2,,

,,220,1,220,1,,

,,230,1,230,1,,

,,239,1,239,1,,

,,248,1,248,1,,

,,258,1,258,1,,

,,267,1,267,2,,

,,277,1,277,2,,

,,289,1,289,2,,

,,298,1,298,2,,

,,308,1,308,2,,