如何使用 with open 来过滤 python 中的数据文件并创建新文件？

Question

我有大量的 csv，我尝试使用 with open 来过滤数据。

我知道我可以在命令行上使用 FINDSTR，但我想使用 python 创建一个过滤后的新文件，或者我想创建一个 pandas 数据框作为输出。

这是我的代码：

outfile = open('my_file2.csv', 'a')
with open('my_file1.csv', 'r') as f:
 for lines in f:
         if '31/10/2018' in lines:
            print(lines)  
         outfile.write(lines)

问题是生成的输出文件=输入文件并且没有过滤器（文件大小相同）

感谢大家

Answer 1

你的代码的问题是最后一行的缩进。它应该在 if 语句中，因此只写入包含 '31/10/2018' 的行。

outfile = open('my_file2.csv', 'a')
with open('my_file1.csv', 'r') as f:
 for lines in f:
         if '31/10/2018' in lines:
            print(lines)  
            outfile.write(lines)

要使用 Pandas 进行过滤并创建 DataFrame，请执行以下操作：

import pandas as pd
import datetime

# I assume here that the date is in a seperate column, named 'Date'
df = pd.read_csv('my_file1.csv', parse_dates=['Date']) 

# Filter on October 31st 2018
df_filter = df[df['Date'].dt.date == datetime.date(2018, 10, 31)]

# Output to csv
df_filter.to_csv('my_file2.csv', index=False)

（对于非常大的 csv，请查看 pd.read_csv() 参数 'chunksize'）

要使用 with open(....) as f:，您可以这样做：

import pandas as pd

filtered_list = []
with open('my_file1.csv', 'r') as f:
    for lines in f:
        if '31/10/2018' in lines:
            print(lines)
            # Split line by comma into list
            line_data = lines.split(',')
            filtered_list.append(line_data)

# Convert to dataframe and export as csv
df = pd.DataFrame(filtered_list)
df_filter.to_csv('my_file2.csv', index=False)

如何使用 with open 来过滤 python 中的数据文件并创建新文件？

How to use with open to filter datafiles in python and create new file?

python

csv

data-warehouse

bigdata

pandas