如何比较 python 中的两个 csv 文件并标记差异?

how to compare two csv file in python and flag the difference?

我是 python 的新手。请帮助我。 这里我有两组 csv 文件。我需要比较并输出差异,例如更改的 data/deleted data/added 数据。这是我的例子

file 1:
Sn  Name  Subject   Marks  
1   Ram      Maths     85
2   sita    Engilsh    66
3   vishnu  science    50
4   balaji  social     60

file 2:
Sn  Name    Subject   Marks
1   Ram     computer  85   #subject name have changed
2   sita    Engilsh   66
3   vishnu  science   90   #marks have changed
4   balaji  social    60
5   kishor  chem      99   #added new line

Output - i need to get like this :

Changed Items: 
1   Ram      computer  85
3   vishnu    science  90
Added item:
5   kishor    chem   99
Deleted item:
.................

我导入了 csv 并通过带有红线的 for 循环进行了比较。我没有得到想要的输出。 在标记文件 1 和文件 2(csv 文件)之间添加和删除的项目时,这让我很困惑。 请建议有效的代码人员。

这里的想法是用 melt 展平您的数据框以比较每个值:

# Load your csv files
df1 = pd.read_csv('file1.csv', ...)
df2 = pd.read_csv('file2.csv', ...)

# Select columns (not mandatory, it depends on your 'Sn' column)
cols = ['Name', 'Subject', 'Marks']

# Flat your dataframes
out1 = df1[cols].melt('Name', var_name='Item', value_name='Old')
out2 = df2[cols].melt('Name', var_name='Item', value_name='New')
out = pd.merge(out1, out2, on=['Name', 'Item'], how='outer')

# Flag the state of each item
condlist = [out['Old'] != out['New'],
            out['Old'].isna(),
            out['New'].isna()]

out['State'] = np.select(condlist, choicelist=['changed', 'added', 'deleted'], 
                         default='unchanged')

输出:

>>> out
     Name     Item      Old       New      State
0     Ram  Subject    Maths  computer    changed
1    sita  Subject  Engilsh   Engilsh  unchanged
2  vishnu  Subject  science   science  unchanged
3  balaji  Subject   social    social  unchanged
4     Ram    Marks       85        85  unchanged
5    sita    Marks       66        66  unchanged
6  vishnu    Marks       50        90    changed
7  balaji    Marks       60        60  unchanged
8  kishor  Subject      NaN      chem    changed
9  kishor    Marks      NaN        99    changed
count, flag = 0, 1
for i, j in zip(df1.values, df2.values):
    if sum(i == j) != 4:
        if flag:
            print("Changed Items:")
            flag = 0
        print(j)
    count += 1

if count != len(df2):
    print("Newly added:")
    print(*df2.iloc[count:, :].values)