unpack_from() 不适用于大文件

Question

我尝试使用 python3.

从一些固定宽度格式的文件（来自 here）读取数据

如果我只预选几行它工作正常，但如果我想去通过 hole 文件（大约 1000 行，每行 611 个块，4 个字符 = 2444 个字符）python 告诉我，struct.Struct(bytes).unpackFrom(bytes) 需要 a buffer of at least 2444 bytes，目前我不知道为什么它有没有这么大的缓冲区。

我运行在 64 位 Linux 上，具有 4 G RAM 和 20 Gig Swap，这也许对我有帮助。

代码片段是这样的：

#edit
"""rowMask is 611 times 4s, just to prevent you from counting it... """
rowMask="4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s"
def readUsableFields(filename,stdPath):
    usableFields=[]
    with open(stdPath+filename,"r") as f:
            count_line=0
            for line in f:
                    count_col=0
                    fields=struct.Struct(bytes(rowMask,"UTF-8")).unpack_from(bytes(line,"UTF-8"))
                    for field in fields:
                            if(field!=-999):
                                    usableFields.append([count_line,count_col])
                            count_col+=1
                    count_line+=1
            return usableFields

我也看了 this and this ，但它们都不是我问题的答案。

一些帮助会很好，如果我的问题重复（我没有找到）请告诉我。

Answer 1

由于许多固定宽度的文件都有页脚（或页眉）代码

将在页脚上失败，因为它的长度可能不正确。

因此您必须检查正确的线长：

rowMask="4s"*611
def readUsableFields(filename,stdPath):
    usableFields=[]

    with open(stdPath+filename,"r") as f:
            count_line=0
            for line in f:
                    count_col=0
                    # len(line) = 611 * 4 +1
                    # as there is a trailing '[=10=]'
                    if(len(line)!=2445):
                            continue
                    fields=struct.Struct(bytes(rowMask,"UTF-8")).unpack_from(bytes(line,"UTF-8"))
                    for field in fields:
                            if(field!=-999):
                                    usableFields.append([count_line,count_col])
                            count_col+=1
                    count_line+=1

            f.close()
    return usableFields

unpack_from() 不适用于大文件

unpack_from() des not work with big files

struct

fixed-width

python-3.x