如何检测 Jpg/Jpeg 格式的部分损坏图像
how to detect partial corrupted images in Jpg/Jpeg format
在一个非常大的图像数据集中,我们有一些损坏的图像,如下图所示。可以毫无问题地查看这些图像,但人眼可以看到一些灰色的损坏区域。如何检测这些损坏的图像?
其实我已经在Matlab中写了一个检测脚本。它可以过滤大部分损坏的图像,但有些会被遗漏。我的脚本的主要思想是找到损坏图像的常见二进制字符串。虽然一些损坏的图像没有获得这个通用的二进制字符串。所以他们不会被过滤。
我的 Matlab 代码:
FOLDER1 = './'; % query data
query_folder1 = FOLDER1;
query_pt1 = dir(strcat(query_folder1, '*.jpg'));
nFile1 = length(query_pt1); % file number
BROKEN_MARK = '00455114';
SIZE = 4; % single size
THRESH = 3;
for i = 1:nFile1
img_dir = strcat(FOLDER1, query_pt1(i).name());
fid = fopen(img_dir);
im1_stats = dir(img_dir);
file_length = im1_stats.bytes;
pos = -4;
epost = -200;
count = 0;
while abs(pos) <= ceil(file_length)
fseek(fid, pos, 'eof');
temp = fread(fid, 1, 'single');
str = num2hex(single(temp));
if(strcmp(str, BROKEN_MARK))
%fprintf('%s\n', img_dir);
if(count >= THRESH)
copyfile(img_dir, 'candidates/');
break;
else
count = count + 1;
end
else
count = 0;
pos = pos - 1;
end
end
fclose(fid);
end
任何人都可以提供一些检测所有损坏图像的想法吗?或者任何 Python、C++、Matlab 或 bash 脚本代码。谢谢你。
如果您可以查看它们,它们在技术上没有损坏。他们在你的认知中是"corrupted"。我个人会计算所有灰色像素,如果百分比大于给定数量,则图像被视为 "corrupted",手动检查并删除。
#!/usr/bin/python
# -*- coding: utf-8 -*-
persantage=10 # Corrupted area in per cent to detect
color=(128, 128, 128) # Corruption color, tuple
from PIL import Image
im = Image.open("corrupted.jpg")
pixels = list(im.getdata())
width, height = im.size
pixels = [pixels[i * width:(i + 1) * width] for i in xrange(height)]
gray=0
other=0
for data in pixels:
for pix in data:
if pix == color:
gray += 1
else:
other += 1
corruption_area= gray *100 / (gray+other)
if corruption_area >= persantage:
print 'Corruption:', corruption_area, '%'
else:
print 'OK'
下图的输出是
Corruption: 18 %
我使用已经检测到的损坏图像来分析损坏的部分。然后使用这些结果来检测可能的损坏。这是 Matlab 代码:
clear; clc;
FOLDER1 = './';
query_folder1 = FOLDER1; % Some corrupted samples are here.
query_pt1 = dir(strcat(query_folder1, '*.jpg'));
nFile1 = length(query_pt1); % file number
OFF = 10;
for i = 1:nFile1
img_dir = strcat(FOLDER1, query_pt1(i).name());
img = imread(img_dir);
[x y ~] = size(img);
img_part = img(x-OFF:x, y-OFF:y, :); % Get samples from right-bottom corner
hist(i, :) = rgbhist_fast(img_part, 4); % get RGB histogram of sample part of corrupted region
end
mean_hist = mean(hist); % Use average of RGB histogram of samples for standard of corruption
FOLDER2 = '~/data/logo_data/930k_iautocrop/'; % Main big dataset
query_folder2 = FOLDER2;
query_pt2 = dir(strcat(query_folder2, '*.jpg'));
nFile2 = length(query_pt2); % file number
for i = 1:nFile2
if(mod(i, 100) == 0)
fprintf('%d\n', i);
end
img_dir = strcat(FOLDER2, query_pt2(i).name());
img = imread(img_dir);
[x y ~] = size(img);
img_part = img(x-OFF:x, y-OFF:y, :);
temp_hist = rgbhist_fast(img_part, 4);
dist(i) = sqrt(sum((mean_hist - temp_hist').^2, 2)); % get corrupted similarity
%imshow(img_part);
end
[v ix] = sort(dist, 'ascend'); % To find most corrupted images. The images on the top of the list have high corruption probability
在一个非常大的图像数据集中,我们有一些损坏的图像,如下图所示。可以毫无问题地查看这些图像,但人眼可以看到一些灰色的损坏区域。如何检测这些损坏的图像?
其实我已经在Matlab中写了一个检测脚本。它可以过滤大部分损坏的图像,但有些会被遗漏。我的脚本的主要思想是找到损坏图像的常见二进制字符串。虽然一些损坏的图像没有获得这个通用的二进制字符串。所以他们不会被过滤。
我的 Matlab 代码:
FOLDER1 = './'; % query data
query_folder1 = FOLDER1;
query_pt1 = dir(strcat(query_folder1, '*.jpg'));
nFile1 = length(query_pt1); % file number
BROKEN_MARK = '00455114';
SIZE = 4; % single size
THRESH = 3;
for i = 1:nFile1
img_dir = strcat(FOLDER1, query_pt1(i).name());
fid = fopen(img_dir);
im1_stats = dir(img_dir);
file_length = im1_stats.bytes;
pos = -4;
epost = -200;
count = 0;
while abs(pos) <= ceil(file_length)
fseek(fid, pos, 'eof');
temp = fread(fid, 1, 'single');
str = num2hex(single(temp));
if(strcmp(str, BROKEN_MARK))
%fprintf('%s\n', img_dir);
if(count >= THRESH)
copyfile(img_dir, 'candidates/');
break;
else
count = count + 1;
end
else
count = 0;
pos = pos - 1;
end
end
fclose(fid);
end
任何人都可以提供一些检测所有损坏图像的想法吗?或者任何 Python、C++、Matlab 或 bash 脚本代码。谢谢你。
如果您可以查看它们,它们在技术上没有损坏。他们在你的认知中是"corrupted"。我个人会计算所有灰色像素,如果百分比大于给定数量,则图像被视为 "corrupted",手动检查并删除。
#!/usr/bin/python
# -*- coding: utf-8 -*-
persantage=10 # Corrupted area in per cent to detect
color=(128, 128, 128) # Corruption color, tuple
from PIL import Image
im = Image.open("corrupted.jpg")
pixels = list(im.getdata())
width, height = im.size
pixels = [pixels[i * width:(i + 1) * width] for i in xrange(height)]
gray=0
other=0
for data in pixels:
for pix in data:
if pix == color:
gray += 1
else:
other += 1
corruption_area= gray *100 / (gray+other)
if corruption_area >= persantage:
print 'Corruption:', corruption_area, '%'
else:
print 'OK'
下图的输出是
Corruption: 18 %
我使用已经检测到的损坏图像来分析损坏的部分。然后使用这些结果来检测可能的损坏。这是 Matlab 代码:
clear; clc;
FOLDER1 = './';
query_folder1 = FOLDER1; % Some corrupted samples are here.
query_pt1 = dir(strcat(query_folder1, '*.jpg'));
nFile1 = length(query_pt1); % file number
OFF = 10;
for i = 1:nFile1
img_dir = strcat(FOLDER1, query_pt1(i).name());
img = imread(img_dir);
[x y ~] = size(img);
img_part = img(x-OFF:x, y-OFF:y, :); % Get samples from right-bottom corner
hist(i, :) = rgbhist_fast(img_part, 4); % get RGB histogram of sample part of corrupted region
end
mean_hist = mean(hist); % Use average of RGB histogram of samples for standard of corruption
FOLDER2 = '~/data/logo_data/930k_iautocrop/'; % Main big dataset
query_folder2 = FOLDER2;
query_pt2 = dir(strcat(query_folder2, '*.jpg'));
nFile2 = length(query_pt2); % file number
for i = 1:nFile2
if(mod(i, 100) == 0)
fprintf('%d\n', i);
end
img_dir = strcat(FOLDER2, query_pt2(i).name());
img = imread(img_dir);
[x y ~] = size(img);
img_part = img(x-OFF:x, y-OFF:y, :);
temp_hist = rgbhist_fast(img_part, 4);
dist(i) = sqrt(sum((mean_hist - temp_hist').^2, 2)); % get corrupted similarity
%imshow(img_part);
end
[v ix] = sort(dist, 'ascend'); % To find most corrupted images. The images on the top of the list have high corruption probability