加入多个文件字典
Join Multiple Files Dictionary
我有一个包含一些字段的母版 table。我想将它与其他一堆 csvs 一起加入。
当前数据如下:
文件 1:
Key Attrib1 Attrib2 Attrib3 Attrib4
文件 2:
Key Attrib5
文件 3:
Key Attrib6
我希望我的最终输出看起来像:
Key Attrib1 Attrib2 Attrib3 Attrib4 Attrib5 Attrib6, etc.
并非所有文件都包含所有密钥。
当前代码:
master = "in.csv"
file1 = "file.csv"
file2 = "file2.csv"
prime = list()
D1 = {}
with open(master) as f:
for k in csv.reader(f):
prime.append(k[0])
for k in prime:
with open(file1,'r') as csvfile:
rd = csv.reader(csvfile,delimiter=",")
for row in rd:
if row[0] ==k:
D1 = dict((row[0],row[1]) for rows in rd)
with open(file2,'r') as csvfile:
rd = csv.reader(csvfile,delimiter=",")
for row in rd:
if row[0] ==k:
D1 = D1+dict((row[0],row[1]) for rows in rd)
这里的想法是打开所有三个文件并将它们写入一个新的.csv 文件。我将如何加入 csv 文件的一般想法是这样的:
import glob
import csv
# gets all the files in your dictionary that end with .csv
csv_files = glob.glob('*.csv')
# create the new csv file, which will be your output
with open('filename.csv', 'w') as outfile:
writer = csv.writer(outfile, delimiter = ',')
for csv_file in csv_files:
with open(csv_file) as infile:
reader = csv.reader(infile, delimiter = ',')
for row in reader:
writer.writerow(row)
您必须操纵 "row" 的确切组成,使其与您的数据的工作方式相匹配(在没有您需要的列的数据上创建空列)。
可能的解决方案是为每个文件创建一个元组格式,在其中为您需要的位置创建空位置。将元组写入行将像这样工作。
for row in reader:
if csv_file == 'file1':
# '' represents a blank field in column
data_to_write = (row[0], row[1], '', row[2])
elif csv_file == 'file2':
data_to_write = '', row[0], row[1],row[2]
writer.writerow(data_to_write)
如果不是您想要的,我认为这确实会关闭:
master = "in.csv"
filelist = "file.csv", "file2.csv"
joined = "joined.csv"
dict1 = {}
with open(master, 'r') as csvfile:
for row in csv.reader(csvfile):
key = row[0]
dict1[key] = row[1:] # note this does not check for duplicate keys
for filename in filelist:
with open(filename, 'rb') as csvfile:
seen = set()
for row in csv.reader(csvfile):
key = row[0]
if key in dict1:
if key in seen:
print('Error: duplicate key %r in file %r - ignored' %
(key, filename))
else:
dict1[key].append(row[1])
seen.add(key)
else: # key not in master
pass # ignore
# add null entry for any keys not present in this file
for key in dict1:
if key not in seen:
dict1[key].append(None)
# write the data in the merged dictionary into a new csv file
with open(joined, 'wb') as newcsvfile:
csv.writer(newcsvfile).writerows(
([key]+attrlist) for key, attrlist in sorted(dict1.iteritems()))
我有一个包含一些字段的母版 table。我想将它与其他一堆 csvs 一起加入。
当前数据如下:
文件 1:
Key Attrib1 Attrib2 Attrib3 Attrib4
文件 2:
Key Attrib5
文件 3:
Key Attrib6
我希望我的最终输出看起来像:
Key Attrib1 Attrib2 Attrib3 Attrib4 Attrib5 Attrib6, etc.
并非所有文件都包含所有密钥。
当前代码:
master = "in.csv"
file1 = "file.csv"
file2 = "file2.csv"
prime = list()
D1 = {}
with open(master) as f:
for k in csv.reader(f):
prime.append(k[0])
for k in prime:
with open(file1,'r') as csvfile:
rd = csv.reader(csvfile,delimiter=",")
for row in rd:
if row[0] ==k:
D1 = dict((row[0],row[1]) for rows in rd)
with open(file2,'r') as csvfile:
rd = csv.reader(csvfile,delimiter=",")
for row in rd:
if row[0] ==k:
D1 = D1+dict((row[0],row[1]) for rows in rd)
这里的想法是打开所有三个文件并将它们写入一个新的.csv 文件。我将如何加入 csv 文件的一般想法是这样的:
import glob
import csv
# gets all the files in your dictionary that end with .csv
csv_files = glob.glob('*.csv')
# create the new csv file, which will be your output
with open('filename.csv', 'w') as outfile:
writer = csv.writer(outfile, delimiter = ',')
for csv_file in csv_files:
with open(csv_file) as infile:
reader = csv.reader(infile, delimiter = ',')
for row in reader:
writer.writerow(row)
您必须操纵 "row" 的确切组成,使其与您的数据的工作方式相匹配(在没有您需要的列的数据上创建空列)。
可能的解决方案是为每个文件创建一个元组格式,在其中为您需要的位置创建空位置。将元组写入行将像这样工作。
for row in reader:
if csv_file == 'file1':
# '' represents a blank field in column
data_to_write = (row[0], row[1], '', row[2])
elif csv_file == 'file2':
data_to_write = '', row[0], row[1],row[2]
writer.writerow(data_to_write)
如果不是您想要的,我认为这确实会关闭:
master = "in.csv"
filelist = "file.csv", "file2.csv"
joined = "joined.csv"
dict1 = {}
with open(master, 'r') as csvfile:
for row in csv.reader(csvfile):
key = row[0]
dict1[key] = row[1:] # note this does not check for duplicate keys
for filename in filelist:
with open(filename, 'rb') as csvfile:
seen = set()
for row in csv.reader(csvfile):
key = row[0]
if key in dict1:
if key in seen:
print('Error: duplicate key %r in file %r - ignored' %
(key, filename))
else:
dict1[key].append(row[1])
seen.add(key)
else: # key not in master
pass # ignore
# add null entry for any keys not present in this file
for key in dict1:
if key not in seen:
dict1[key].append(None)
# write the data in the merged dictionary into a new csv file
with open(joined, 'wb') as newcsvfile:
csv.writer(newcsvfile).writerows(
([key]+attrlist) for key, attrlist in sorted(dict1.iteritems()))