kNN 特征应该作为列表传递
kNN feature should passed through as list
我的数据是这样的:
sample1 = [[1, 0, 3, 5, 0, 9], 0, 1.5, 0]
sample2 = [[0, 4, 0, 6, 2, 0], 2, 1.9, 1]
sample3 = [[9, 7, 6, 0, 0, 0], 0, 1.3, 1]
paul = pd.DataFrame(data = [sample1, sample2, sample3], columns=`['list','cat','metr','target'])`
在此数据上,应执行具有特定距离函数的 scikit-learn kNN 回归。
距离函数为:
def my_distance(X,Y,**kwargs):
if len(X)>1:
x = X
y = Y
all_minima = []
for k in range(len(x)):
one_minimum = min(x[k],y[k])
all_minima.append(one_minimum)
sum_all_minima=sum(all_minima)
distance = (sum(x)+sum(y)-sum_all_minima) * kwargs["Para_list"]
elif X.dtype=='int64':
x = X
y = Y
if x == y and x != -1:
distance = 0
elif x == -1 or y == -1 or x is None or y is None:
distance = kwargs["Para_minus1"] * 1
else:
distance = kwargs["Para_nominal"] * 1
else:
x = X
y = Y
if x == y:
distance = 0
elif x == -1 or y == -1 or x is None or y is None:
distance = kwargs["Para_minus1"] * 1
else:
distance = abs(x-y) * kwargs["Para_metrisch"]
return distance
并且应该通过
实现为有效的距离函数
DistanceMetric.get_metric('pyfunc',func=my_distance)
我猜对了,scikit代码应该是这样的:
train , test = train_test_split(paul, test_size = 0.3)
#x_train soll nur unabhähgige Variablen enthalten, andere kommen raus:
x_train = train.drop('target', axis=1)
y_train = train['target']
x_test = test.drop('target', axis = 1)
y_test = test['target']
knn = KNeighborsRegressor(n_neighbors=2,
algorithm='ball_tree',
metric=my_distance,
metric_params={"Para_list": 2,
"Para_minus1": 3,
"Para_metrisch": 2,
"Para_nominal": 4}))
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)
我明白了
ValueError: setting an array element with a sequence.
我猜想 scikit 无法将单个功能项作为列表处理?有办法实现吗?
I guess scikit can not handle a single feature item as list? Is there a way to make that happen?
不,我不知道有什么办法可以做到这一点。您需要将此特征转换为二维矩阵,将其与其他一维特征连接起来,以适当地形成数据。这是标准的 sklearn
行为。
除非你有一些非常窄的 use-case,从列表功能制作二维数组完全没问题。我假设所有列表的长度都相同。
我的数据是这样的:
sample1 = [[1, 0, 3, 5, 0, 9], 0, 1.5, 0]
sample2 = [[0, 4, 0, 6, 2, 0], 2, 1.9, 1]
sample3 = [[9, 7, 6, 0, 0, 0], 0, 1.3, 1]
paul = pd.DataFrame(data = [sample1, sample2, sample3], columns=`['list','cat','metr','target'])`
在此数据上,应执行具有特定距离函数的 scikit-learn kNN 回归。
距离函数为:
def my_distance(X,Y,**kwargs):
if len(X)>1:
x = X
y = Y
all_minima = []
for k in range(len(x)):
one_minimum = min(x[k],y[k])
all_minima.append(one_minimum)
sum_all_minima=sum(all_minima)
distance = (sum(x)+sum(y)-sum_all_minima) * kwargs["Para_list"]
elif X.dtype=='int64':
x = X
y = Y
if x == y and x != -1:
distance = 0
elif x == -1 or y == -1 or x is None or y is None:
distance = kwargs["Para_minus1"] * 1
else:
distance = kwargs["Para_nominal"] * 1
else:
x = X
y = Y
if x == y:
distance = 0
elif x == -1 or y == -1 or x is None or y is None:
distance = kwargs["Para_minus1"] * 1
else:
distance = abs(x-y) * kwargs["Para_metrisch"]
return distance
并且应该通过
实现为有效的距离函数DistanceMetric.get_metric('pyfunc',func=my_distance)
我猜对了,scikit代码应该是这样的:
train , test = train_test_split(paul, test_size = 0.3)
#x_train soll nur unabhähgige Variablen enthalten, andere kommen raus:
x_train = train.drop('target', axis=1)
y_train = train['target']
x_test = test.drop('target', axis = 1)
y_test = test['target']
knn = KNeighborsRegressor(n_neighbors=2,
algorithm='ball_tree',
metric=my_distance,
metric_params={"Para_list": 2,
"Para_minus1": 3,
"Para_metrisch": 2,
"Para_nominal": 4}))
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)
我明白了
ValueError: setting an array element with a sequence.
我猜想 scikit 无法将单个功能项作为列表处理?有办法实现吗?
I guess scikit can not handle a single feature item as list? Is there a way to make that happen?
不,我不知道有什么办法可以做到这一点。您需要将此特征转换为二维矩阵,将其与其他一维特征连接起来,以适当地形成数据。这是标准的 sklearn
行为。
除非你有一些非常窄的 use-case,从列表功能制作二维数组完全没问题。我假设所有列表的长度都相同。