使用 pandas 删除 csv 文件中的特定行

Deletion of a particular row in a csv file using pandas

相对较新 pandas 并尝试删除文件 XYZ 中存在于文件 ABC 中的每一行。

代码:

import pandas as pd

# Reads two excel files
clm1 = pd.read_csv('ABC.csv')
clm2 = pd.read_csv('XYZ.csv')

# Prints file length
print('Main file clm2: '+ str(len(clm2['image_url'])))
print('Referral file clm1': str(len(clm1['Input.image_url'])))

for index1 in clm1.index:
    for index2 in clm2.index:
        if clm2['image_url'][index2] == clm1['Input.image_url'][index1]:
            print("Entered into deletion condition!!")

            print(clm2['image_url'][index2])
            print(clm1['Input.image_url'][index1])
            print('\n \n')

            clm2.drop(clm2['image_url'][index2], axis=0, inplace=True)
            print('Deleted!!')

print('Main file clm2: ' + str(len(clm2['image_url'])))

进入删除条件后,正确打印下行:

            print(clm2['image_url'][index2])
            print(clm1['Input.image_url'][index1])
            print('\n \n')

但是在线上报错:

clm2.drop(clm2['image_url'][index2], axis=0, inplace=True)

错误说:

  File "compare_delete_imagelinks.py", line 19, in <module>
    clm2.drop(clm2['image_url'][index2], axis=0, inplace=False)
  File "/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/pandas/core/frame.py", line 3940, in drop
    errors=errors)
  File "/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/pandas/core/generic.py", line 3780, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/pandas/core/generic.py", line 3812, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/Users/AjayB/anaconda3/envs/MyDjangoEnv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4965, in drop
    '{} not found in axis'.format(labels[mask]))
KeyError: "['https://Xxxxxxx.216PPU~V.JPG'] not found in axis"
(MyDjangoEnv) SL-SP-LAP-0384:scripts AjayB$ 

如何解决这个问题?

如果您的 csv 如下所示,这应该有效:

XYZ.csv:

name,value
a,1
b,2
c,3
d,4
e,5
f,6

ABC.csv:

name,value
a,1
b,2
c,3
d,4

代码:

import pandas as pd
import numpy as np

xyz = pd.read_csv("XYZ.csv", index_col='name')
abc = pd.read_csv("ABC.csv", index_col='name')

for i in abc.index:
    if i in xyz.index:
        xyz.drop(i, axis=0, inplace=True)

print(xyz)