分割草书字符(阿拉伯语 OCR)
Segmenting cursive character (Arabic OCR)
我想将阿拉伯语单词分割成单个字符。基于 histogram/profile,我假设我可以通过 cut/segment 字符基于它的基线(它具有相似的像素值)进行分割过程。
但是,不幸的是,我仍然坚持构建适当的代码,以使其正常工作。
% Original Code by Soumyadeep Sinha
% Saving each single segmented character as one file
function [segm] = trysegment (a)
myFolder = 'D:. Thesis FINISH!!!\Data set\trial';
level = graythresh (a);
bw = im2bw (a, level);
b = imcomplement (bw);
i= padarray(b,[0 10]);
verticalProjection = sum(i, 1);
set(gcf, 'Name', 'Trying Segmentation for Cursive', 'NumberTitle', 'Off')
subplot(2, 2, 1);imshow(i);
subplot(2,2,3);
plot(verticalProjection, 'b-'); %histogram show by this code
% hist(reshape(input,[],3),1:max(input(:)));
grid on;
% % t = verticalProjection;
% % t(t==0) = inf;
% % mayukh = min(t)
% 0 where there is background, 1 where there are letters
letterLocations = verticalProjection > 0;
% Find Rising and falling edges
d = diff(letterLocations);
startingColumns = find(d>0);
endingColumns = find(d<0);
% Extract each region
y=1;
for k = 1 : length(startingColumns)
% Get sub image of just one character...
subImage = i(:, startingColumns(k):endingColumns(k));
% se = strel('rectangle',[2 4]);
% dil = imdilate(subImage, se);
th = bwmorph(subImage,'thin',Inf);
n = imresize (th, [64 NaN], 'bilinear');
figure, imshow (n);
[L,num] = bwlabeln(n);
for z= 1 : num
bw= ismember(L, z);
% Construct filename for this particular image.
baseFileName = sprintf('char %d.png', y);
y=y+1;
% Prepend the folder to make the full file name.
fullFileName = fullfile(myFolder, baseFileName);
% Do the write to disk.
imwrite(bw, fullFileName);
% subplot(2,2,4);
% pause(2);
% imshow(bw);
end
% y=y+1;
end;
segm = (n);
文字图片如下:
为什么代码不起作用?
你有其他代码的建议吗?
或建议的算法使其工作,对草书字符进行良好的分割?
先谢谢了。
从发布的代码中替换此代码部分
% 0 where there is background, 1 where there are letters
letterLocations = verticalProjection > 0;
% Find Rising and falling edges
d = diff(letterLocations);
startingColumns = find(d>0);
endingColumns = find(d<0);
新代码部分
threshold=max(verticalProjection)/3;
thresholdedProjection=verticalProjection > threshold;
count=0;
startingColumnsIndex=0;
for i=1:length(thresholdedProjection)
if thresholdedProjection(i)
if(count>0)
startingColumnsIndex=startingColumnsIndex+1;
startingColumns(startingColumnsIndex)= i-floor(count/2);
count=0;
end
else
count=count+1;
end
end
endingColumns=[startingColumns(2:end)-1 i-floor(count/2)];
其余代码无需更改。
我想将阿拉伯语单词分割成单个字符。基于 histogram/profile,我假设我可以通过 cut/segment 字符基于它的基线(它具有相似的像素值)进行分割过程。 但是,不幸的是,我仍然坚持构建适当的代码,以使其正常工作。
% Original Code by Soumyadeep Sinha
% Saving each single segmented character as one file
function [segm] = trysegment (a)
myFolder = 'D:. Thesis FINISH!!!\Data set\trial';
level = graythresh (a);
bw = im2bw (a, level);
b = imcomplement (bw);
i= padarray(b,[0 10]);
verticalProjection = sum(i, 1);
set(gcf, 'Name', 'Trying Segmentation for Cursive', 'NumberTitle', 'Off')
subplot(2, 2, 1);imshow(i);
subplot(2,2,3);
plot(verticalProjection, 'b-'); %histogram show by this code
% hist(reshape(input,[],3),1:max(input(:)));
grid on;
% % t = verticalProjection;
% % t(t==0) = inf;
% % mayukh = min(t)
% 0 where there is background, 1 where there are letters
letterLocations = verticalProjection > 0;
% Find Rising and falling edges
d = diff(letterLocations);
startingColumns = find(d>0);
endingColumns = find(d<0);
% Extract each region
y=1;
for k = 1 : length(startingColumns)
% Get sub image of just one character...
subImage = i(:, startingColumns(k):endingColumns(k));
% se = strel('rectangle',[2 4]);
% dil = imdilate(subImage, se);
th = bwmorph(subImage,'thin',Inf);
n = imresize (th, [64 NaN], 'bilinear');
figure, imshow (n);
[L,num] = bwlabeln(n);
for z= 1 : num
bw= ismember(L, z);
% Construct filename for this particular image.
baseFileName = sprintf('char %d.png', y);
y=y+1;
% Prepend the folder to make the full file name.
fullFileName = fullfile(myFolder, baseFileName);
% Do the write to disk.
imwrite(bw, fullFileName);
% subplot(2,2,4);
% pause(2);
% imshow(bw);
end
% y=y+1;
end;
segm = (n);
文字图片如下:
为什么代码不起作用? 你有其他代码的建议吗? 或建议的算法使其工作,对草书字符进行良好的分割?
先谢谢了。
从发布的代码中替换此代码部分
% 0 where there is background, 1 where there are letters
letterLocations = verticalProjection > 0;
% Find Rising and falling edges
d = diff(letterLocations);
startingColumns = find(d>0);
endingColumns = find(d<0);
新代码部分
threshold=max(verticalProjection)/3;
thresholdedProjection=verticalProjection > threshold;
count=0;
startingColumnsIndex=0;
for i=1:length(thresholdedProjection)
if thresholdedProjection(i)
if(count>0)
startingColumnsIndex=startingColumnsIndex+1;
startingColumns(startingColumnsIndex)= i-floor(count/2);
count=0;
end
else
count=count+1;
end
end
endingColumns=[startingColumns(2:end)-1 i-floor(count/2)];
其余代码无需更改。