- 不支持的操作数类型:'str' 和 'str'
Unsupported operand type(s) for -: 'str' and 'str'
我是数据分析的新手,正在寻找 help.I 我正在使用 python.I 从头开始创建我的 Knn 算法,我认为我的数据(训练和测试)有问题。我认为我必须转换为浮动但我不是 100% 确定。我知道我的函数正在运行,因为我用另一个数据集试过了。
from scipy.io import arff
from io import StringIO
import scipy
import pandas as pd
import numpy as np
import math
data_train = scipy.io.arff.loadarff('train.arff')
train = pd.DataFrame(data_train[0])
train.head()
data_test = scipy.io.arff.loadarff('test1.arff')
print(data_test)
test = pd.DataFrame(data_test[0])
test.head()
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(train, test, test_size = 0.1, random_state=42)
print(X_train, X_test, y_train, y_test)
def distance(testpoint, trainpoint):
# distance between testpoint and trainpoint.
dist = np.sqrt(np.sum(np.power(testpoint-trainpoint, 2)))
return dis
def getNeighbors(X_train, y_train, X_test, k):
#For each point in X_test, calculate its distance from itself and each point in X_train
k_neighbors_with_labels = [] # this will be a list (for each test point) of list (contains the tuple (distance,label) of k nearest neighbors).
for testpoint in X_test:
distances_label = [] # this list carries distances between the testpoint and train point
for (trainpoint,y_train_label) in zip(X_train,y_train):
# calculate the distance and append it to a distances_label with the associated label.
distances_label.append((distance(testpoint, trainpoint), y_train_label))
k_neighbors_with_labels += [sorted(distances_label)[0:k]] # sort the distances and taken the first k neighbors
return k_neighbors_with_labels
ne = getNeighbors(X_train, y_train, X_test, k = 3)
print(ne)
TypeError Traceback (most recent call last)
<ipython-input-56-3b2868d1fd43> in <module>()
----> 1 ne = getNeighbors(X_train, y_train, X_test, k = 3)
2 print(ne)
<ipython-input-55-75b4da86d04e> in getNeighbors(X_train, y_train, X_test, k)
6 for (trainpoint,y_train_label) in zip(X_train,y_train):
7 # calculate the distance and append it to a distances_label with the associated label.
----> 8 distances_label.append((distance(testpoint, trainpoint), y_train_label))
9 k_neighbors_with_labels += [sorted(distances_label)[0:k]] # sort the distances and taken the first k neighbors
10 return k_neighbors_with_labels
<ipython-input-42-03d38977fec4> in distance(testpoint, trainpoint)
1 def distance(testpoint, trainpoint):
2 # distance between testpoint and trainpoint.
----> 3 dist = np.sqrt(np.sum(np.power(testpoint-trainpoint, 2)))
4 return distance
TypeError: unsupported operand type(s) for -: 'str' and 'str'
如评论所述 - testpoint 和 trainpoint 似乎是字符串。
要确认这一点,您可以添加 print(type(testpoint))
和 print(type(trainpoint))
到您的代码以找出它们实际上是什么类型。如果它们确实是字符串(并且错误提示了这一点);假设它们是存储为字符串的数字,那么您可以通过执行以下操作简单地转换为 int 或 float:
dist = np.sqrt(np.sum(np.power(float(testpoint)-float(trainpoint), 2)))
根据您的要求,根据需要将 int 替换为 float。
有很多方法可以解决这个问题,但根本问题是您不能在字符串上使用 - 运算符 - 正如错误指出的那样。
我是数据分析的新手,正在寻找 help.I 我正在使用 python.I 从头开始创建我的 Knn 算法,我认为我的数据(训练和测试)有问题。我认为我必须转换为浮动但我不是 100% 确定。我知道我的函数正在运行,因为我用另一个数据集试过了。
from scipy.io import arff
from io import StringIO
import scipy
import pandas as pd
import numpy as np
import math
data_train = scipy.io.arff.loadarff('train.arff')
train = pd.DataFrame(data_train[0])
train.head()
data_test = scipy.io.arff.loadarff('test1.arff')
print(data_test)
test = pd.DataFrame(data_test[0])
test.head()
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(train, test, test_size = 0.1, random_state=42)
print(X_train, X_test, y_train, y_test)
def distance(testpoint, trainpoint):
# distance between testpoint and trainpoint.
dist = np.sqrt(np.sum(np.power(testpoint-trainpoint, 2)))
return dis
def getNeighbors(X_train, y_train, X_test, k):
#For each point in X_test, calculate its distance from itself and each point in X_train
k_neighbors_with_labels = [] # this will be a list (for each test point) of list (contains the tuple (distance,label) of k nearest neighbors).
for testpoint in X_test:
distances_label = [] # this list carries distances between the testpoint and train point
for (trainpoint,y_train_label) in zip(X_train,y_train):
# calculate the distance and append it to a distances_label with the associated label.
distances_label.append((distance(testpoint, trainpoint), y_train_label))
k_neighbors_with_labels += [sorted(distances_label)[0:k]] # sort the distances and taken the first k neighbors
return k_neighbors_with_labels
ne = getNeighbors(X_train, y_train, X_test, k = 3)
print(ne)
TypeError Traceback (most recent call last)
<ipython-input-56-3b2868d1fd43> in <module>()
----> 1 ne = getNeighbors(X_train, y_train, X_test, k = 3)
2 print(ne)
<ipython-input-55-75b4da86d04e> in getNeighbors(X_train, y_train, X_test, k)
6 for (trainpoint,y_train_label) in zip(X_train,y_train):
7 # calculate the distance and append it to a distances_label with the associated label.
----> 8 distances_label.append((distance(testpoint, trainpoint), y_train_label))
9 k_neighbors_with_labels += [sorted(distances_label)[0:k]] # sort the distances and taken the first k neighbors
10 return k_neighbors_with_labels
<ipython-input-42-03d38977fec4> in distance(testpoint, trainpoint)
1 def distance(testpoint, trainpoint):
2 # distance between testpoint and trainpoint.
----> 3 dist = np.sqrt(np.sum(np.power(testpoint-trainpoint, 2)))
4 return distance
TypeError: unsupported operand type(s) for -: 'str' and 'str'
如评论所述 - testpoint 和 trainpoint 似乎是字符串。
要确认这一点,您可以添加 print(type(testpoint))
和 print(type(trainpoint))
到您的代码以找出它们实际上是什么类型。如果它们确实是字符串(并且错误提示了这一点);假设它们是存储为字符串的数字,那么您可以通过执行以下操作简单地转换为 int 或 float:
dist = np.sqrt(np.sum(np.power(float(testpoint)-float(trainpoint), 2)))
根据您的要求,根据需要将 int 替换为 float。
有很多方法可以解决这个问题,但根本问题是您不能在字符串上使用 - 运算符 - 正如错误指出的那样。