如何并行打开超过 19 个文件 (Python)？

Question

我有一个项目需要读取数据，然后根据每一行并行写入超过 23 个 CSV 文件。例如，如果该行是关于温度的，我们应该写成 temperature.csv，如果是关于湿度，则写成 >>humid.CSV 等。

我尝试了以下方法：

with open('Results\GHCN_Daily\MetLocations.csv','wb+') as locations, \
            open('Results\GHCN_Daily\Tmax.csv','wb+')as tmax_d, \
            open('Results\GHCN_Daily\Tmin.csv','wb+')as tmin_d, \
            open('Results\GHCN_Daily\Snow.csv', 'wb+')as snow_d, \
            .
            .
            # total of 23 'open' statements
            .

            open('Results\GHCN_Daily\SnowDepth.csv','wb+')as snwd_d, \
            open('Results\GHCN_Daily\Cloud.csv', 'wb+')as cloud_d, \
            open('Results\GHCN_Daily\Evap.csv', 'wb+')as evap_d, \

我收到以下错误

SystemError: too many statically nested blocks python

我搜索了这个错误，然后我找到了 this post，上面写着

You will encounter this error when you nest blocks more than 20. This is a design decision of Python interpreter to restrict it to 20.

但是我写的open语句是并行打开文件的，不是嵌套的。

我做错了什么，我该如何解决这个问题？

提前致谢。

Answer 1

如果数据不是很大，为什么不把所有的数据都读入，然后按类别分组（比如把所有关于温度的数据归为一组），然后把分组后的数据一次性写到相应的文件中呢？

Answer 2

我会有一个可能的文件列表 = ['humidity','temperature',...]
制作一个包含可能文件、数据框、文件路径的 dic，例如：

main_dic = {}

for file in possible_files:

    main_dic[file][path] = '%s.csv' %file
    main_dic[file][data] = pd.DataFrame([], columns=['value','other_column','another_column', ....])

之后，我会阅读您从中获取值的任何文档，并将它们存储在正确的字典数据框中。

完成后只需将数据保存在 csv 上，示例：

for file in main_dic:

     main_dic[file][data].to_csv('%s.csv' %file, index=False)

希望对您有所帮助

Answer 3

每个打开都是一个嵌套的上下文，只是 python 语法允许您将它们放在 comma-separated 列表中。 contextlib.ExitStack 是一个上下文容器，可让您在堆栈中放置任意数量的上下文，并在完成后退出每个上下文。所以，你可以做

import contextlib

files_to_process = (
    ('Results\GHCN_Daily\MetLocations.csv', 'locations'),
    ('Results\GHCN_Daily\Tmax.csv', 'tmax_d'),
    ('Results\GHCN_Daily\Tmin.csv', 'tmin_d'),
    # ...
)

with contextlib.ExitStack() as stack:
    files = {varname:stack.enter_context(open(filename, 'rb'))
        for filename, varname in files_to_process}
    # and for instance...
    files['locations'].writeline('my location\n')

如果您发现 dict 访问不如属性访问整洁，您可以创建一个简单的容器 class

class SimpleNamespace:

    def __init__(self, name_val_pairs):
        self.__dict__.update(name_val_pairs)

with contextlib.ExitStack() as stack:
    files = SimpleNamespace(((varname, stack.enter_context(open(filename, 'rb')))
        for filename, varname in files_to_process))
    # and for instance...
    files.locations.writeline('my location\n')

Answer 4

这样打开>20个文件就可以了。

# your list of file names
file_names = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u'] 
fh = [] # list of file handlers
for idx,f in enumerate(files):
    fileName = f + '.txt'
    fh.append(open(fileName,'w'))

# do what you need here
print "done"

for f in fh:
    f.close()

虽然不确定您是否真的需要这样做。

如何并行打开超过 19 个文件 (Python)？

How to open more than 19 files in parallel (Python)?

python

csv

nested

with-statement