CSV 字典,比较 2 个 CSV 文件并根据匹配值替换值

CSV Dictionary, comparing 2 CSV files and replacing values based on matching values

我正在尝试比较 2 个 CSV 文件,如果其中一列的值匹配,那么我 将需要用第二个 csv 文件的值替换另一个 csv 文件的值。

示例:

Book1.csv:

Alfa,Beta,Charlie,Delta,Echo,Foxtrot,Golf,Hotel,India,Juliett,Kilo
A1,B1,C1,D1,E1,F1,G1,H1,I1,J1,
A2,B2,C2,D2,E2,F2,G2,H2,I2,J2,1
A3,B3,C3,D3,E3,F3,G3,H3,I3,J3,
A4,B4,C4,D4,E4,F4,G4,H4,I4,J4,
A5,B5,C5,D5,E1,F5,G5,H5,I5,J5,1
A6,B6,C6,D6,E6,F6,G6,H6,I6,J6,1
A7,B7,C7,D7,E7,F7,G7,H7,I7,J7,
A8,B8,C8,D8,E8,F8,G8,H8,I8,J8,1
A9,B9,C9,D9,E9,F9,G9,H9,I9,J9,

Book2.csv:

Oscar,Papa,Lima
A1_x,B1,K2
A2,B2,X2
A3_x,B3,L2
A4_x,B4,K2
A5,B5,J2
A6,B6,A2
A7_x,B7,AS
A8,B8,S3
A9_x,B9,S1

如果 Book2.csv 列“Oscar”的值等于 Book1.csv“Alfa”的值(A2==A2A3!=A3A4==A4 ),然后 Book2.csv 的“Lima”值将覆盖 Book1.csv 的“Beta”值中的任何内容。

所以 Beta_New 的输出看起来像这样(看起来像这样是因为我在下面的代码中切换并通过字典调用列:

xtest_file.csv:

Beta_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
X2,E2,F2,H2,I2,C2
J2,E5,F5,H5,I5,C5
A2,E6,F6,H6,I6,C6
S3,E8,F8,H8,I8,C8

到目前为止我的代码:

import csv

fieldnames_dict = {
    'Beta': 'Beta_New',
    'Echo': 'Echo_New',
    'Foxtrot': 'Foxtrot_New_ALL',
    'Hotel': 'Hotel_New',
    'India': 'India_New',
    'Charlie': 'Charlie_New'
}

1_open_cd_csv = open("book1.csv", "r", encoding="utf-8", errors='ignore')
1_reader_cd_csv = csv.DictReader(1_open_cd_csv, delimiter=',', quotechar='"')
1_header_csv = next(1_reader_cd_csv)

2_open_cd_csv = open("book2.csv", "r", encoding="utf-8", errors='ignore')
2_reader_cd_csv = csv.DictReader(2_open_cd_csv, delimiter=',', quotechar='"')
2_header_csv = next(2_reader_cd_csv)

open_output_test_csv = open(xtest_file.csv, "w", encoding="utf-8", errors='ignore')
writer_output_test_csv = csv.DictWriter(open_output_test_csv, delimiter=',',
                                        quotechar='"', quoting=csv.QUOTE_ALL,
                                        fieldnames=list(fieldnames_dict.values()))
writer_output_test_csv.writeheader()

for row_in in 1_reader_cd_csv:
    if row_in['Kilo'] == "1":
        row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
        writer.writerow(row_out)
        #if 2_reader_cd_csv[Oscar]'s values = 1_reader_cd_csv[Alfa]'s values:
            #then the output for "Beta_New" = the values for 2_reader_cd_csv[Lima]

或者像这样:

for row_in in 1_reader_cd_csv:
    alfa_match = 2_reader_cd_csv['Oscar'] matches with = row_in['Alfa']
    if alfa_match != none and row_in['Kilo'] == "1":
        row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
        #then the output for "Beta_New" = the values for 2_reader_cd_csv[Lima]
        writer.writerow(row_out)

If Book2.csv column "Oscar"'s values are equal to Book1.csv "Alfa"'s values (A2==A2, A3!=A3, A4==A4), then Book2.csv's "Lima" values will overwrite whatever is in Book1.csv's "Beta" values.

您可能想 pandas 试试看:

import pandas as pd

#read csvs as dataframes
df1 = pd.read_csv("book1.csv")
df2 = pd.read_csv("book2.csv")

#replace 'Beta' in first df with the value in 'Lima' where 'Alfa' matches 'Oscar'. 
df1['Beta'].where(df1['Alfa'] != df2['Oscar'], df2['Lima'], inplace=True)
#store as csv
df1.to_csv('new_file.csv')

输出 df1:

Alfa Beta Charlie Delta Echo Foxtrot Golf Hotel India Juliett Kilo
0 A1 B1 C1 D1 E1 F1 G1 H1 I1 J1 nan
1 A2 X2 C2 D2 E2 F2 G2 H2 I2 J2 1
2 A3 B3 C3 D3 E3 F3 G3 H3 I3 J3 nan
3 A4 B4 C4 D4 E4 F4 G4 H4 I4 J4 nan
4 A5 J2 C5 D5 E1 F5 G5 H5 I5 J5 1
5 A6 A2 C6 D6 E6 F6 G6 H6 I6 J6 1
6 A7 B7 C7 D7 E7 F7 G7 H7 I7 J7 nan
7 A8 S3 C8 D8 E8 F8 G8 H8 I8 J8 1
8 A9 B9 C9 D9 E9 F9 G9 H9 I9 J9 nan

将您的代码分成每个部分的较小块:

  1. 读取“book1.csv”行,其中 Kilo="1" 到 file1
  2. 读取行“book2.csv”到file2
  3. 根据 Oscar
  4. file2 值替换 file1 中“Beta”的值
  5. 写回新的 csv 文件
with open("book1.csv", "r") as infile:
    reader = csv.DictReader(infile)
    file1 = [row for row in reader if row["Kilo"]=='1']

with open("book2.csv") as infile:
    reader = csv.DictReader(infile)
    file2 = [row for row in reader]

output = list()
oscars = [row["Oscar"] for row in file2]
for row in file1:
    if row["Alfa"] in oscars:
        row["Beta"] = [r["Lima"] for r in file2 if r["Oscar"]==row["Alfa"]][0]
    output.append({new:row[old] for old,new in fieldnames_dict.items()})

with open("output.csv", "w", newline="") as outfile:
    writer = csv.DictWriter(outfile,fieldnames=list(fieldnames_dict.values()))
    writer.writeheader()
    for row in output:
        writer.writerow(row)
output.csv:
Beta_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
X2,E2,F2,H2,I2,C2
J2,E1,F5,H5,I5,C5
A2,E6,F6,H6,I6,C6
S3,E8,F8,H8,I8,C8