直方图的长度是不同的情况下

Question

我是运行 LBP 算法，根据纹理特征对 images 进行分类。分类方法是 LinearSVC in sklearn.svm package.

通过SVM获取直方图和拟合已经完成，但有时histogram的length取决于image。

示例如下：

from skimage import feature
from scipy.stats import itemfreq
from sklearn.svm import LinearSVC
import numpy as np
import cv2
import cvutils
import csv
import os

def __get_hist(image, radius):
    NumPoint = radius*8
    lbp = feature.local_binary_pattern(image, NumPoint, radius, method="uniform")
    x = itemfreq(lbp.ravel())
    hist = x[:,1]/sum(x[:,1])
    return hist
def get_trainHist_list(train_txt):
    train_dic = {}
    with open(train_txt, 'r') as csvfile:
        reader = csv.reader(csvfile, delimiter = ' ')
        for row in reader:
            train_dic[row[0]] = int(row[1])

    hist_list=[]
    key_list=[]
    label_list=[]
    for key, label in train_dic.items():
        img = cv2.imread("D:/Python36/images/texture/%s" %key, cv2.IMREAD_GRAYSCALE)
        key_list.append(key)
        label_list.append(label)
        hist_list.append(__get_hist(img,3))
    bundle = [np.array(key_list), np.array(label_list), np.array(hist_list)]
    return bundle

train_txt = 'D:/Python36/images/class_train.txt'
train_hist = get_trainHist_list(train_txt)
model = LinearSVC(C=100.0, random_state=42)
model.fit(train_hist[2], train_hist[1])
for i in train_hist[2]:
    print(len(i))

test_img = cv2.imread("D:/Python36/images/texture_test/flat-3.png", cv2.IMREAD_GRAYSCALE)
hist= np.array(__get_hist(test_img, 3))
print(len(hist))
prediction = model.predict([hist])
print(prediction)

结果

26
26
26
26
26
26
25
Traceback (most recent call last):
  File "D:\Python36\texture.py", line 44, in <module>
    prediction = model.predict([hist])
  File "D:\Python36\lib\site-packages\sklearn\linear_model\base.py", line 324, in predict
    scores = self.decision_function(X)
  File "D:\Python36\lib\site-packages\sklearn\linear_model\base.py", line 305, in decision_function
    % (X.shape[1], n_features))
ValueError: X has 25 features per sample; expecting 26

可以看出，histogram的length对training images的都是26，而test_img的是25。因此，predict 在 SVM 中不起作用。

我猜 test_img 在 histogram 中有空的部分，那些空的部分可以跳过。（我不确定）

有人有解决办法吗？

Answer 1

8 个点的邻域有 59 个不同的统一 LBP。这应该是你的特征向量的维度，但这不是因为你使用 itemfreq 来计算直方图（作为旁注，itemfreq is deprecated). The length of the histograms obtained throug itemfreq is the number of different uniform LBPs in the image. If some uniform LBPs are not present in the image the number of bins of the resulting histogram will be lower than 59. This issue can be easily fixed by utilizing bincount 如下面的玩具示例所示：

import numpy as np
from skimage import feature
from scipy.stats import itemfreq

lbp = np.array([[0, 0, 0, 0],
                [1, 1, 1, 1],
                [8, 8, 9, 9]])

hi = itemfreq(lbp.ravel())[:, 1]  # wrong approach
hb = np.bincount(lbp.ravel(), minlength=59)  # proposed method

输出如下所示：

In [815]: hi
Out[815]: array([4, 4, 2, 2], dtype=int64)

In [816]: hb
Out[816]: 
array([4, 4, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0], dtype=int64)

直方图的长度是不同的情况下

Lenght of histogram is differ in case

histogram

svm

scikit-learn

scikit-image

lbph-algorithm

结果