CSV 字典,比较 2 个 CSV 文件并根据匹配值替换值
CSV Dictionary, comparing 2 CSV files and replacing values based on matching values
我正在尝试比较 2 个 CSV 文件,如果其中一列的值匹配,那么我
将需要用第二个 csv 文件的值替换另一个 csv 文件的值。
示例:
Book1.csv
:
Alfa,Beta,Charlie,Delta,Echo,Foxtrot,Golf,Hotel,India,Juliett,Kilo
A1,B1,C1,D1,E1,F1,G1,H1,I1,J1,
A2,B2,C2,D2,E2,F2,G2,H2,I2,J2,1
A3,B3,C3,D3,E3,F3,G3,H3,I3,J3,
A4,B4,C4,D4,E4,F4,G4,H4,I4,J4,
A5,B5,C5,D5,E1,F5,G5,H5,I5,J5,1
A6,B6,C6,D6,E6,F6,G6,H6,I6,J6,1
A7,B7,C7,D7,E7,F7,G7,H7,I7,J7,
A8,B8,C8,D8,E8,F8,G8,H8,I8,J8,1
A9,B9,C9,D9,E9,F9,G9,H9,I9,J9,
Book2.csv
:
Oscar,Papa,Lima
A1_x,B1,K2
A2,B2,X2
A3_x,B3,L2
A4_x,B4,K2
A5,B5,J2
A6,B6,A2
A7_x,B7,AS
A8,B8,S3
A9_x,B9,S1
如果 Book2.csv
列“Oscar”的值等于 Book1.csv
“Alfa”的值(A2==A2
、A3!=A3
、A4==A4
),然后 Book2.csv
的“Lima”值将覆盖 Book1.csv
的“Beta”值中的任何内容。
所以 Beta_New
的输出看起来像这样(看起来像这样是因为我在下面的代码中切换并通过字典调用列:
xtest_file.csv
:
Beta_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
X2,E2,F2,H2,I2,C2
J2,E5,F5,H5,I5,C5
A2,E6,F6,H6,I6,C6
S3,E8,F8,H8,I8,C8
到目前为止我的代码:
import csv
fieldnames_dict = {
'Beta': 'Beta_New',
'Echo': 'Echo_New',
'Foxtrot': 'Foxtrot_New_ALL',
'Hotel': 'Hotel_New',
'India': 'India_New',
'Charlie': 'Charlie_New'
}
1_open_cd_csv = open("book1.csv", "r", encoding="utf-8", errors='ignore')
1_reader_cd_csv = csv.DictReader(1_open_cd_csv, delimiter=',', quotechar='"')
1_header_csv = next(1_reader_cd_csv)
2_open_cd_csv = open("book2.csv", "r", encoding="utf-8", errors='ignore')
2_reader_cd_csv = csv.DictReader(2_open_cd_csv, delimiter=',', quotechar='"')
2_header_csv = next(2_reader_cd_csv)
open_output_test_csv = open(xtest_file.csv, "w", encoding="utf-8", errors='ignore')
writer_output_test_csv = csv.DictWriter(open_output_test_csv, delimiter=',',
quotechar='"', quoting=csv.QUOTE_ALL,
fieldnames=list(fieldnames_dict.values()))
writer_output_test_csv.writeheader()
for row_in in 1_reader_cd_csv:
if row_in['Kilo'] == "1":
row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
writer.writerow(row_out)
#if 2_reader_cd_csv[Oscar]'s values = 1_reader_cd_csv[Alfa]'s values:
#then the output for "Beta_New" = the values for 2_reader_cd_csv[Lima]
或者像这样:
for row_in in 1_reader_cd_csv:
alfa_match = 2_reader_cd_csv['Oscar'] matches with = row_in['Alfa']
if alfa_match != none and row_in['Kilo'] == "1":
row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
#then the output for "Beta_New" = the values for 2_reader_cd_csv[Lima]
writer.writerow(row_out)
If Book2.csv column "Oscar"'s values are equal to Book1.csv "Alfa"'s
values (A2==A2, A3!=A3, A4==A4), then Book2.csv's "Lima" values will
overwrite whatever is in Book1.csv's "Beta" values.
您可能想 pandas
试试看:
import pandas as pd
#read csvs as dataframes
df1 = pd.read_csv("book1.csv")
df2 = pd.read_csv("book2.csv")
#replace 'Beta' in first df with the value in 'Lima' where 'Alfa' matches 'Oscar'.
df1['Beta'].where(df1['Alfa'] != df2['Oscar'], df2['Lima'], inplace=True)
#store as csv
df1.to_csv('new_file.csv')
输出 df1:
Alfa
Beta
Charlie
Delta
Echo
Foxtrot
Golf
Hotel
India
Juliett
Kilo
0
A1
B1
C1
D1
E1
F1
G1
H1
I1
J1
nan
1
A2
X2
C2
D2
E2
F2
G2
H2
I2
J2
1
2
A3
B3
C3
D3
E3
F3
G3
H3
I3
J3
nan
3
A4
B4
C4
D4
E4
F4
G4
H4
I4
J4
nan
4
A5
J2
C5
D5
E1
F5
G5
H5
I5
J5
1
5
A6
A2
C6
D6
E6
F6
G6
H6
I6
J6
1
6
A7
B7
C7
D7
E7
F7
G7
H7
I7
J7
nan
7
A8
S3
C8
D8
E8
F8
G8
H8
I8
J8
1
8
A9
B9
C9
D9
E9
F9
G9
H9
I9
J9
nan
将您的代码分成每个部分的较小块:
- 读取“book1.csv”行,其中 Kilo="1" 到
file1
- 读取行“book2.csv”到
file2
- 根据 Oscar
的 file2
值替换 file1
中“Beta”的值
- 写回新的 csv 文件
with open("book1.csv", "r") as infile:
reader = csv.DictReader(infile)
file1 = [row for row in reader if row["Kilo"]=='1']
with open("book2.csv") as infile:
reader = csv.DictReader(infile)
file2 = [row for row in reader]
output = list()
oscars = [row["Oscar"] for row in file2]
for row in file1:
if row["Alfa"] in oscars:
row["Beta"] = [r["Lima"] for r in file2 if r["Oscar"]==row["Alfa"]][0]
output.append({new:row[old] for old,new in fieldnames_dict.items()})
with open("output.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile,fieldnames=list(fieldnames_dict.values()))
writer.writeheader()
for row in output:
writer.writerow(row)
output.csv:
Beta_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
X2,E2,F2,H2,I2,C2
J2,E1,F5,H5,I5,C5
A2,E6,F6,H6,I6,C6
S3,E8,F8,H8,I8,C8
我正在尝试比较 2 个 CSV 文件,如果其中一列的值匹配,那么我 将需要用第二个 csv 文件的值替换另一个 csv 文件的值。
示例:
Book1.csv
:
Alfa,Beta,Charlie,Delta,Echo,Foxtrot,Golf,Hotel,India,Juliett,Kilo
A1,B1,C1,D1,E1,F1,G1,H1,I1,J1,
A2,B2,C2,D2,E2,F2,G2,H2,I2,J2,1
A3,B3,C3,D3,E3,F3,G3,H3,I3,J3,
A4,B4,C4,D4,E4,F4,G4,H4,I4,J4,
A5,B5,C5,D5,E1,F5,G5,H5,I5,J5,1
A6,B6,C6,D6,E6,F6,G6,H6,I6,J6,1
A7,B7,C7,D7,E7,F7,G7,H7,I7,J7,
A8,B8,C8,D8,E8,F8,G8,H8,I8,J8,1
A9,B9,C9,D9,E9,F9,G9,H9,I9,J9,
Book2.csv
:
Oscar,Papa,Lima
A1_x,B1,K2
A2,B2,X2
A3_x,B3,L2
A4_x,B4,K2
A5,B5,J2
A6,B6,A2
A7_x,B7,AS
A8,B8,S3
A9_x,B9,S1
如果 Book2.csv
列“Oscar”的值等于 Book1.csv
“Alfa”的值(A2==A2
、A3!=A3
、A4==A4
),然后 Book2.csv
的“Lima”值将覆盖 Book1.csv
的“Beta”值中的任何内容。
所以 Beta_New
的输出看起来像这样(看起来像这样是因为我在下面的代码中切换并通过字典调用列:
xtest_file.csv
:
Beta_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
X2,E2,F2,H2,I2,C2
J2,E5,F5,H5,I5,C5
A2,E6,F6,H6,I6,C6
S3,E8,F8,H8,I8,C8
到目前为止我的代码:
import csv
fieldnames_dict = {
'Beta': 'Beta_New',
'Echo': 'Echo_New',
'Foxtrot': 'Foxtrot_New_ALL',
'Hotel': 'Hotel_New',
'India': 'India_New',
'Charlie': 'Charlie_New'
}
1_open_cd_csv = open("book1.csv", "r", encoding="utf-8", errors='ignore')
1_reader_cd_csv = csv.DictReader(1_open_cd_csv, delimiter=',', quotechar='"')
1_header_csv = next(1_reader_cd_csv)
2_open_cd_csv = open("book2.csv", "r", encoding="utf-8", errors='ignore')
2_reader_cd_csv = csv.DictReader(2_open_cd_csv, delimiter=',', quotechar='"')
2_header_csv = next(2_reader_cd_csv)
open_output_test_csv = open(xtest_file.csv, "w", encoding="utf-8", errors='ignore')
writer_output_test_csv = csv.DictWriter(open_output_test_csv, delimiter=',',
quotechar='"', quoting=csv.QUOTE_ALL,
fieldnames=list(fieldnames_dict.values()))
writer_output_test_csv.writeheader()
for row_in in 1_reader_cd_csv:
if row_in['Kilo'] == "1":
row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
writer.writerow(row_out)
#if 2_reader_cd_csv[Oscar]'s values = 1_reader_cd_csv[Alfa]'s values:
#then the output for "Beta_New" = the values for 2_reader_cd_csv[Lima]
或者像这样:
for row_in in 1_reader_cd_csv:
alfa_match = 2_reader_cd_csv['Oscar'] matches with = row_in['Alfa']
if alfa_match != none and row_in['Kilo'] == "1":
row_out = {new: row_in[old] for old, new in fieldnames_dict.items()}
#then the output for "Beta_New" = the values for 2_reader_cd_csv[Lima]
writer.writerow(row_out)
If Book2.csv column "Oscar"'s values are equal to Book1.csv "Alfa"'s values (A2==A2, A3!=A3, A4==A4), then Book2.csv's "Lima" values will overwrite whatever is in Book1.csv's "Beta" values.
您可能想 pandas
试试看:
import pandas as pd
#read csvs as dataframes
df1 = pd.read_csv("book1.csv")
df2 = pd.read_csv("book2.csv")
#replace 'Beta' in first df with the value in 'Lima' where 'Alfa' matches 'Oscar'.
df1['Beta'].where(df1['Alfa'] != df2['Oscar'], df2['Lima'], inplace=True)
#store as csv
df1.to_csv('new_file.csv')
输出 df1:
Alfa | Beta | Charlie | Delta | Echo | Foxtrot | Golf | Hotel | India | Juliett | Kilo | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | A1 | B1 | C1 | D1 | E1 | F1 | G1 | H1 | I1 | J1 | nan |
1 | A2 | X2 | C2 | D2 | E2 | F2 | G2 | H2 | I2 | J2 | 1 |
2 | A3 | B3 | C3 | D3 | E3 | F3 | G3 | H3 | I3 | J3 | nan |
3 | A4 | B4 | C4 | D4 | E4 | F4 | G4 | H4 | I4 | J4 | nan |
4 | A5 | J2 | C5 | D5 | E1 | F5 | G5 | H5 | I5 | J5 | 1 |
5 | A6 | A2 | C6 | D6 | E6 | F6 | G6 | H6 | I6 | J6 | 1 |
6 | A7 | B7 | C7 | D7 | E7 | F7 | G7 | H7 | I7 | J7 | nan |
7 | A8 | S3 | C8 | D8 | E8 | F8 | G8 | H8 | I8 | J8 | 1 |
8 | A9 | B9 | C9 | D9 | E9 | F9 | G9 | H9 | I9 | J9 | nan |
将您的代码分成每个部分的较小块:
- 读取“book1.csv”行,其中 Kilo="1" 到
file1
- 读取行“book2.csv”到
file2
- 根据 Oscar 的
- 写回新的 csv 文件
file2
值替换 file1
中“Beta”的值
with open("book1.csv", "r") as infile:
reader = csv.DictReader(infile)
file1 = [row for row in reader if row["Kilo"]=='1']
with open("book2.csv") as infile:
reader = csv.DictReader(infile)
file2 = [row for row in reader]
output = list()
oscars = [row["Oscar"] for row in file2]
for row in file1:
if row["Alfa"] in oscars:
row["Beta"] = [r["Lima"] for r in file2 if r["Oscar"]==row["Alfa"]][0]
output.append({new:row[old] for old,new in fieldnames_dict.items()})
with open("output.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile,fieldnames=list(fieldnames_dict.values()))
writer.writeheader()
for row in output:
writer.writerow(row)
output.csv:
Beta_New,Echo_New,Foxtrot_New_ALL,Hotel_New,India_New,Charlie_New
X2,E2,F2,H2,I2,C2
J2,E1,F5,H5,I5,C5
A2,E6,F6,H6,I6,C6
S3,E8,F8,H8,I8,C8