使用 python 和 CPLEX 的 SVM，加载 objective 函数的二次部分

Question

''In general, it would get better performance creating batches of linear constraints rather than creating them one at a time. I just wondering if it states even with a huge problem.'' - 聪明的程序员。

明确地说，我有一个 (35k x 40) 数据集，我想对其进行 SVM。我需要生成这个数据集的格拉姆矩阵，这很好，但是将系数传递给 CPLEX 是一团糟，需要几个小时，这里是我的代码：

    nn = 35000
    XXt = np.random.rand(nn,nn) # the gramm matrix of the dataset
    yy = np.random.rand(nn)     # the label vector of the dataset

    temp = ((yy*XXt).T)*yy
    xg, yg = np.meshgrid(range(nn), range(nn))
    indici = np.dstack([yg,xg])

    quadraric_part = []
    for ii in xrange(nn):
        for indd in indici[ii][ii:]:
            quadraric_part.append([indd[0],indd[1],temp[indd[0],indd[1]]])

'quadratic_part' 是 [i,j,c_ij] 形式的列表，其中 c_ij 是存储在 temp 中的系数。它将传递给 CPLEX Python API.

的函数 'objective.set_quadratic_coefficients()'

有更明智的方法吗？

P.S。我可能有内存问题，所以它会更好，而不是存储整个列表 'quadratic_part'，多次调用函数 'objective.set_quadratic_coefficients()'...你明白我的意思吗？！

Answer 1

在幕后，objective.set_quadratic makes use of the CPXXcopyquad function in the C Callable Library. Whereas, objective.set_quadratic_coefficients uses CPXXcopyqpsep。

这是一个示例（请记住，我不是 numpy 专家；很可能有更好的方法来完成这部分）：

import numpy as np
import cplex

nn = 5  # a small example size here

XXt = np.random.rand(nn,nn) # the gramm matrix of the dataset
yy = np.random.rand(nn)     # the label vector of the dataset
temp = ((yy*XXt).T)*yy

# create symetric matrix
tempu = np.triu(temp)     # upper triangle
iu1 = np.triu_indices(nn, 1)
tempu.T[iu1] = tempu[iu1] # copy upper into lower

ind = np.array([[x for x in range(nn)] for x in range(nn)])

qmat = []
for i in range(nn):
    qmat.append([np.arange(nn), tempu[i]])

c = cplex.Cplex()
c.variables.add(lb=[0]*nn)
c.objective.set_quadratic(qmat)
c.write("test2.lp")

您的 Q 矩阵是完全密集的，因此根据您拥有的内存量，此技术可能无法扩展。但是，在可能的情况下，使用 objective.set_quadratic 初始化 Q 矩阵应该会获得更好的性能。也许您需要使用一些混合技术，您可以同时使用 set_quadratic 和 set_quadratic_coefficients.

使用 python 和 CPLEX 的 SVM，加载 objective 函数的二次部分

SVM with python and CPLEX, load the quadratic part of the objective function

python

machine-learning

linear-programming

svm

cplex