如何读取 python 中的多个 csv 并获得一个 csv 作为输出
How to read multiple csv in python and get one csv as output
我已经在pandas中问过如何解决的问题。但现在我需要一个非 pandas 版本。
我的代码
import glob
import os
## path
path = r'C:/x/x/Desktop/xxx/'
all_files = glob.glob(os.path.join(path, '*.csv'))
## column
column_headers = ['Date', 'Time', 'Duration', 'IP', 'Request']
## open only one csv. -- I want to read here not only 1 file --
## my approach:
## with open(all_files) as log, ....
with open('log.csv') as log, open('out355.csv', 'w') as out:
out.write(';'.join(column_headers)+'\n')
while True:
try:
lines = [next(log).strip('\n').split(' ',4) for i in range(6)][3:]
out.write(';'.join(lines[1][:2]+[l[4] for l in lines])+'\n')
except StopIteration:
break
因为我是 python 的新手,所以我不能仅仅修改我的 运行 代码就这么好。所以如果我能得到完整的代码,我会很高兴。
谢谢!
你很接近了,你需要阅读每个 *.csv
文件并连接它们。所以你必须打开一个新文件并使用 glob 读取每个 csv 文件。确保执行此操作时,每个 csv 文件的末尾都有一个尾随换行符,否则您将得到 file_x
的最后一行和 file_x+1
的第一个数据行在同一行
from glob import glob
with open('combined.csv', 'a') as combinedFile:
combinedFile.write('a,b,c,d,e\n') # Headers
for eachFile in glob('*.csv'):
if eachFile == 'combined.csv':
pass
else:
count = 0
for line in open(eachFile, 'r'):
if count != 0: # So that you don't read 1st line of every file if it contains the headers.
combinedFile.write(line)
count = 1
运行时间:
a.csv
a,b,c,d,e
1,2,3,4,5
6,7,8,9,10
b.csv
a,b,c,d,e
11,12,13,14,15
16,17,18,19,20
combined.csv
a,b,c,d,e
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
16,17,18,19,20
按照这些思路应该可行:
with open('out355.csv', 'w') as out:
for csvfile in all_files:
with open(csvfile) as log:
out.write(...)
.. the rest of your script ..
这应该有效
import glob
import os
## path
path = r'C:/x/x/Desktop/xxx/'
all_files = glob.glob(os.path.join(path, '*.csv'))
## column
column_headers = ['Date', 'Time', 'Duration', 'IP', 'Request']
out = open('out355.csv', 'w')
out.write(';'.join(column_headers)+'\n')
for file_ in all_files:
log = open(file_)
while True:
try:
lines = [next(log).strip('\n').split(' ',4) for i in range(6)][3:]
out.write(';'.join(lines[1][:2]+[l[4] for l in lines])+'\n')
except StopIteration:
break
我已经在pandas中问过如何解决的问题。但现在我需要一个非 pandas 版本。
我的代码
import glob
import os
## path
path = r'C:/x/x/Desktop/xxx/'
all_files = glob.glob(os.path.join(path, '*.csv'))
## column
column_headers = ['Date', 'Time', 'Duration', 'IP', 'Request']
## open only one csv. -- I want to read here not only 1 file --
## my approach:
## with open(all_files) as log, ....
with open('log.csv') as log, open('out355.csv', 'w') as out:
out.write(';'.join(column_headers)+'\n')
while True:
try:
lines = [next(log).strip('\n').split(' ',4) for i in range(6)][3:]
out.write(';'.join(lines[1][:2]+[l[4] for l in lines])+'\n')
except StopIteration:
break
因为我是 python 的新手,所以我不能仅仅修改我的 运行 代码就这么好。所以如果我能得到完整的代码,我会很高兴。
谢谢!
你很接近了,你需要阅读每个 *.csv
文件并连接它们。所以你必须打开一个新文件并使用 glob 读取每个 csv 文件。确保执行此操作时,每个 csv 文件的末尾都有一个尾随换行符,否则您将得到 file_x
的最后一行和 file_x+1
的第一个数据行在同一行
from glob import glob
with open('combined.csv', 'a') as combinedFile:
combinedFile.write('a,b,c,d,e\n') # Headers
for eachFile in glob('*.csv'):
if eachFile == 'combined.csv':
pass
else:
count = 0
for line in open(eachFile, 'r'):
if count != 0: # So that you don't read 1st line of every file if it contains the headers.
combinedFile.write(line)
count = 1
运行时间:
a.csv
a,b,c,d,e
1,2,3,4,5
6,7,8,9,10
b.csv
a,b,c,d,e
11,12,13,14,15
16,17,18,19,20
combined.csv
a,b,c,d,e
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
16,17,18,19,20
按照这些思路应该可行:
with open('out355.csv', 'w') as out:
for csvfile in all_files:
with open(csvfile) as log:
out.write(...)
.. the rest of your script ..
这应该有效
import glob
import os
## path
path = r'C:/x/x/Desktop/xxx/'
all_files = glob.glob(os.path.join(path, '*.csv'))
## column
column_headers = ['Date', 'Time', 'Duration', 'IP', 'Request']
out = open('out355.csv', 'w')
out.write(';'.join(column_headers)+'\n')
for file_ in all_files:
log = open(file_)
while True:
try:
lines = [next(log).strip('\n').split(' ',4) for i in range(6)][3:]
out.write(';'.join(lines[1][:2]+[l[4] for l in lines])+'\n')
except StopIteration:
break