kNN 特征应该作为列表传递

kNN feature should passed through as list

我的数据是这样的:

sample1 = [[1, 0, 3, 5, 0, 9], 0, 1.5, 0]
sample2 = [[0, 4, 0, 6, 2, 0], 2, 1.9, 1]
sample3 = [[9, 7, 6, 0, 0, 0], 0, 1.3, 1] 
paul = pd.DataFrame(data = [sample1, sample2, sample3], columns=`['list','cat','metr','target'])`

在此数据上,应执行具有特定距离函数的 scikit-learn kNN 回归。

距离函数为:

def my_distance(X,Y,**kwargs):
    if len(X)>1:
        x = X
        y = Y
        all_minima = []
        for k in range(len(x)):
            one_minimum = min(x[k],y[k])
            all_minima.append(one_minimum)
            
        sum_all_minima=sum(all_minima)
        distance = (sum(x)+sum(y)-sum_all_minima) * kwargs["Para_list"]
      
    elif  X.dtype=='int64':
        x = X
        y = Y
        if x == y and x != -1:
            distance = 0
        elif x == -1 or y == -1 or x is None or y is None:
            distance = kwargs["Para_minus1"] * 1
        else:
            distance = kwargs["Para_nominal"] * 1
    else:
        x = X
        y = Y
        if x == y:
            distance = 0
        elif x == -1 or y == -1 or x is None or y is None:
            distance = kwargs["Para_minus1"] * 1
        else:
            distance = abs(x-y) * kwargs["Para_metrisch"]
    return distance

并且应该通过

实现为有效的距离函数
DistanceMetric.get_metric('pyfunc',func=my_distance)

我猜对了,scikit代码应该是这样的:

train , test = train_test_split(paul, test_size = 0.3)

#x_train soll nur unabhähgige Variablen enthalten, andere kommen raus:
x_train = train.drop('target', axis=1)
y_train = train['target']

x_test = test.drop('target', axis = 1)
y_test = test['target']

knn = KNeighborsRegressor(n_neighbors=2,
                          algorithm='ball_tree',
                          metric=my_distance,
                          metric_params={"Para_list": 2,
                                         "Para_minus1": 3,
                                         "Para_metrisch": 2,
                                         "Para_nominal": 4}))
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)

我明白了

ValueError: setting an array element with a sequence.

我猜想 scikit 无法将单个功能项作为列表处理?有办法实现吗?

I guess scikit can not handle a single feature item as list? Is there a way to make that happen?

不,我不知道有什么办法可以做到这一点。您需要将此特征转换为二维矩阵,将其与其他一维特征连接起来,以适当地形成数据。这是标准的 sklearn 行为。

除非你有一些非常窄的 use-case,从列表功能制作二维数组完全没问题。我假设所有列表的长度都相同。