如何在 python 中生成 10000 行文件？

Question

我正在读取一个非常大的文件，所以我想用python的惰性方法一次读取10000:

def read_file():
    jd_records = []
    file_name = "test.csv"
    with open(file=file_name, mode='rt') as inf:
        has_header = csv.Sniffer().has_header(inf.read(1024))
        inf.seek(0)
        incsv = csv.reader(inf, delimiter=",")

        if has_header:
            next(incsv)

        while True:
            row = next(incsv)

            jd_records.append(row)

            line_num += 1

            if not line_num % 10000:
                yield jd_records

这种方法的问题在于：我不能yield最后的日期，比如我有15555行，那么最后的5555就不会yield

Answer 1

发布的代码有几个问题。

循环外需要有一个yield jd_records，以发出剩余的记录。（这会导致问题中提到的问题。）
jd_records 列表需要重新设置，例如使用 del jd_records[:]，在循环内的 yield 之后。没有这个，它会多次产生相同的记录。
A bare next(iterator) 将在读取最后一个元素后引发 StopIteration。您需要将其包装在 try/except 或（更好）使用 for 循环。

例如：

def read_file():
    jd_records = []
    file_name = "test.csv"
    with open(file=file_name, mode='rt') as inf:
        has_header = csv.Sniffer().has_header(inf.read(1024))
        inf.seek(0)
        incsv = csv.reader(inf, delimiter=",")

        if has_header:
            next(incsv)

        for row in incsv:
            jd_records.append(row)
            line_num += 1
            if not line_num % 10000:
                yield jd_records
                del jd_records[:]

        if jd_records:
            yield jd_records

如何在 python 中生成 10000 行文件？

how to yield 10000 lines of a file in python?

python

yield