检查文件夹 "A" 中的任何图像文件 (.JPG) 是否在文件夹 "B" 中有注释文件 (.XML)

checking if for any image file (.JPG) in folder "A" there is an annotation file (.XML) in folder "B"

我有一个非常大的图像数据集,它们的注释保存在两个单独的文件夹中,但并非所有图像都有注释文件。 如何编写 python 代码来检查文件夹 "A" 中的图像文件 (.JPG),如果没有注释文件 (. xml) 为该特定图像使用相同的名称,并且 什么都不做 如果注释文件存在?

我在下面的@Gabip 评论后编写了以下代码:

如何改进此代码?

试试这个:

from os import listdir,remove
from os.path import isfile, join

images_path = "full/path/to/folder_a"
annotations_path = "full/path/to/folder_b"


# this function will help to retrieve all files with provided extension in a given folder
def get_files_names_with_extension(full_path, ext):
    return [f for f in listdir(full_path) if isfile(join(full_path, f)) and f.lower().endswith(".{}".format(ext))]


images = get_files_names_with_extension(images_path, "jpg")
annotations = set([f.split(".")[0] for f in get_files_names_with_extension(annotations_path, "xml")])

for img in images:
    if img.split(".")[0] not in annotations:
        remove(join(images_path, img))

我遇到了同样的问题。 我对你的建议做了一些调整。 现在:

  • 显示的图像和 XML 的数量
  • 图像与 XML 进行比较
  • XML 与图像进行比较
  • 不是消除不一致,实际上是用丢失文件的名称创建列表

检查:(IMG x XML)和(XML x IMG)

from os import listdir
from os.path import isfile, join

images_path = "full/path/to/folder_a"
annotations_path = "full/path/to/folder_b"


# function created to return a list of all files in the "full_path" directory with an "ext" extensiondef get_files_names_with_extension(full_path, ext):
    return [f for f in listdir(full_path) if isfile(join(full_path, f)) and f.lower().endswith(".{}".format(ext))]

# use the function to retrieve the NAME of IMGs and XMLs WITHOUT EXTENSION (facilitates the conference)images = set([f.split(".")[0] for f in get_files_names_with_extension(images_path, "jpg")])
annotations = set([f.split(".")[0] for f in get_files_names_with_extension(annotations_path, "xml")])
print('='*30)
print(f'number of IMGs = {len(images)}')
print(f'number of XMLs = {len(annotations)}')

# create a list of all IMGs looking for the one that does not have the corresponding XML
print('='*30)
list_error_img = []
for img in images:
    if img not in annotations:
        list_error_img.append(img)
if not list_error_img:
    print("OK, all IMG has its XML")        
else:
    print("ERROR: IMGs that do not have XML")
    print(list_error_img)

# creates a list of all XMLs looking for what does not have the corresponding IMG
print('='*30)
list_error_xml = []
for ann in annotations:
    if ann not in images:
        list_error_xml.append(ann)
if not list_error_xml:
    print("OK, all XML has its IMG")        
else:
    print("ERRO: XMLs tha do not have IMG")
    print(list_error_xml)
print('='*30)