Python 合并 CSV,删除 header 并删除空格
Python combine CSVs, remove header and remove blanks
我对 Python 非常陌生并试图弄清楚以下内容:
我有多个 CSV 文件(月度文件),我想将它们合并成一个年度文件。每月文件都有 header,所以我试图保留第一个 header 并删除其余的。我使用下面的脚本完成了这个,但是每个月之间有 10 个空行。
有谁知道我可以添加什么来删除空白行?
import shutil
import glob
#import csv files from folder
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort() # glob lacks reliable ordering, so impose your own if output order matters
with open('someoutputfile.csv', 'wb') as outfile:
for i, fname in enumerate(allFiles):
with open(fname, 'rb') as infile:
if i != 0:
infile.readline() # Throw away header on all but first file
# Block copy rest of file from input to output without parsing
shutil.copyfileobj(infile, outfile)
print(fname + " has been imported.")
提前致谢!
假设数据集不超过你的记忆,我建议阅读 pandas 中的每个文件,连接数据帧并从那里过滤。空白行可能会显示为 nan。
import pandas as pd
import glob
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort()
df = pd.Dataframe()
for i, fname in enumerate(allFiles):
#append data to existing dataframe
df = df.append(pd.read(fname), ignore_index = True)
#hopefully, this will drop blank rows
df = df.dropna(how = 'all')
#write to file
df.to_csv('someoutputfile.csv')
我对 Python 非常陌生并试图弄清楚以下内容:
我有多个 CSV 文件(月度文件),我想将它们合并成一个年度文件。每月文件都有 header,所以我试图保留第一个 header 并删除其余的。我使用下面的脚本完成了这个,但是每个月之间有 10 个空行。
有谁知道我可以添加什么来删除空白行?
import shutil
import glob
#import csv files from folder
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort() # glob lacks reliable ordering, so impose your own if output order matters
with open('someoutputfile.csv', 'wb') as outfile:
for i, fname in enumerate(allFiles):
with open(fname, 'rb') as infile:
if i != 0:
infile.readline() # Throw away header on all but first file
# Block copy rest of file from input to output without parsing
shutil.copyfileobj(infile, outfile)
print(fname + " has been imported.")
提前致谢!
假设数据集不超过你的记忆,我建议阅读 pandas 中的每个文件,连接数据帧并从那里过滤。空白行可能会显示为 nan。
import pandas as pd
import glob
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort()
df = pd.Dataframe()
for i, fname in enumerate(allFiles):
#append data to existing dataframe
df = df.append(pd.read(fname), ignore_index = True)
#hopefully, this will drop blank rows
df = df.dropna(how = 'all')
#write to file
df.to_csv('someoutputfile.csv')