Python: ValueError: could not convert string to float: 'Isolated' when reading input file for applying random forest
Python: ValueError: could not convert string to float: 'Isolated' when reading input file for applying random forest
我正在尝试将随机森林应用于以下输入文件:
gold,Program,Requirement,MethodType,Top,Side,CallersT,CallersN,CallersU,CallersCallersT,CallersCallersN,CallersCallersU,CalleesT,CalleesN,CalleesU,CalleesCalleesT,CalleesCalleesN,CalleesCalleesU
T,chess,1,Inner,T,T,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,-1,Low,
N,chess,2,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,3,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,4,Root,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,5,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,6,Root,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,7,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,8,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,1,Leaf,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,2,Leaf,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,3,Leaf,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,4,Root,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,5,Isolated,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
T,chess,6,Inner,TU,T,Low,-1,-1,Low,-1,-1,Medium,-1,Medium,High,-1,High,
T,chess,7,Isolated,TU,T,Low,-1,-1,Low,-1,-1,Medium,-1,Medium,High,-1,High,
N,chess,8,Inner,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,1,Inner,TNU,N,-1,Low,-1,-1,-1,-1,Low,Low,High,Medium,-1,Medium,
N,chess,2,Inner,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,Low,Low,Medium,
N,chess,3,Inner,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,-1,Medium,Medium,
T,chess,4,Inner,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,Low,Low,Medium,
N,chess,5,Leaf,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,-1,Medium,Medium,
这是我用来应用随机森林的代码:
import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectFromModel
from sklearn.model_selection import train_test_split
# Feature Scaling
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
X_train={}
X_test={}
y_train={}
y_test={}
dataset = pd.read_csv( 'dataExtended2.txt', sep= ',')
#convert T into 1 and N into 0
dataset['gold'] = dataset['gold'].astype('category').cat.codes
dataset['Program'] = dataset['Program'].astype('category').cat.codes
dataset['MethodType'] = dataset['MethodType'].astype('category').cat.codes
dataset['Top'] = dataset['Top'].astype('category').cat.codes
dataset['Side'] = dataset['Side'].astype('category').cat.codes
dataset['CallersT'] = dataset['CallersT'].astype('category').cat.codes
dataset['CallersN'] = dataset['CallersN'].astype('category').cat.codes
dataset['CallersU'] = dataset['CallersU'].astype('category').cat.codes
dataset['CallersCallersT'] = dataset['CallersCallersT'].astype('category').cat.codes
dataset['CallersCallersN'] = dataset['CallersCallersN'].astype('category').cat.codes
dataset['CallersCallersU'] = dataset['CallersCallersU'].astype('category').cat.codes
dataset['CalleesT'] = dataset['CalleesT'].astype('category').cat.codes
dataset['CalleesN'] = dataset['CalleesN'].astype('category').cat.codes
dataset['CalleesU'] = dataset['CalleesU'].astype('category').cat.codes
dataset['CalleesCalleesT'] = dataset['CalleesCalleesT'].astype('category').cat.codes
dataset['CalleesCalleesN'] = dataset['CalleesCalleesN'].astype('category').cat.codes
dataset['CalleesCalleesU'] = dataset['CalleesCalleesU'].astype('category').cat.codes
pd.set_option('display.max_columns', None)
print(dataset.head())
row_count, column_count = dataset.shape
X = dataset.iloc[:, 1:column_count].values
y = dataset.iloc[:, 0].values
Xcol = dataset.iloc[:, 1:column_count]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
我在执行代码的最后一行 (X_train = sc.fit_transform(X_train)
) 时收到错误:ValueError: could not convert string to float: 'Isolated'
尽管我正在使用代码行:dataset['MethodType'] = dataset['MethodType'].astype('category').cat.codes
转换 MethodType
从字符串到浮点数。我该如何解决这个问题?
这是错误的回溯:
Traceback (most recent call last):
File "<ipython-input-38-d7fe5c294c10>", line 1, in <module>
runfile('C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python/RandomForestSimplified.py', wdir='C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python')
File "C:\Users\mouna\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
execfile(filename, namespace)
File "C:\Users\mouna\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python/RandomForestSimplified.py", line 43, in <module>
X_train = sc.fit_transform(X_train)
File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\base.py", line 517, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py", line 590, in fit
return self.partial_fit(X, y)
File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py", line 612, in partial_fit
warn_on_dtype=True, estimator=self, dtype=FLOAT_DTYPES)
File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 433, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'Isolated'
Ok 当您查看代码 (print(dataset.head())) 的输出时,您会看到第一列 'gold' 但这仍然是一个字符串。发生这种情况是因为 pandas 使用第一列作为索引。
gold Program Requirement MethodType Top Side CallersT CallersN \
T 0 0 Inner 2 1 1 0 0
N 0 1 Inner 0 0 0 1 0
N 0 2 Inner 0 0 0 1 0
N 0 3 Root 0 0 0 1 0
N 0 4 Inner 0 0 0 1 0
CallersU CallersCallersT CallersCallersN CallersCallersU CalleesT \
T 1 0 0 1 0
N 0 1 0 0 1
N 0 1 0 0 1
N 0 1 0 0 1
N 0 1 0 0 1
CalleesN CalleesU CalleesCalleesT CalleesCalleesN CalleesCalleesU
T 0 0 0 1 -1
N 0 0 0 1 -1
N 0 0 0 1 -1
N 0 0 0 1 -1
N 0 0 0 1 -1
解决方案:
dataset = pd.read_csv( 'dataExtended2.txt', sep= ',', index_col=False)
那么输出将是:
gold Program Requirement MethodType Top Side CallersT CallersN \
0 1 0 1 0 2 1 1 0
1 0 0 2 0 0 0 0 1
2 0 0 3 0 0 0 0 1
3 0 0 4 3 0 0 0 1
4 0 0 5 0 0 0 0 1
CallersU CallersCallersT CallersCallersN CallersCallersU CalleesT \
0 0 1 0 0 1
1 0 0 1 0 0
2 0 0 1 0 0
3 0 0 1 0 0
4 0 0 1 0 0
CalleesN CalleesU CalleesCalleesT CalleesCalleesN CalleesCalleesU
0 0 0 0 0 1
1 1 0 0 0 1
2 1 0 0 0 1
3 1 0 0 0 1
4 1 0 0 0 1
pandashttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
的 csv 导入文档中有更多详细信息
我正在尝试将随机森林应用于以下输入文件:
gold,Program,Requirement,MethodType,Top,Side,CallersT,CallersN,CallersU,CallersCallersT,CallersCallersN,CallersCallersU,CalleesT,CalleesN,CalleesU,CalleesCalleesT,CalleesCalleesN,CalleesCalleesU
T,chess,1,Inner,T,T,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,-1,Low,
N,chess,2,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,3,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,4,Root,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,5,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,6,Root,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,7,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,8,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,1,Leaf,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,2,Leaf,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,3,Leaf,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,4,Root,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,5,Isolated,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
T,chess,6,Inner,TU,T,Low,-1,-1,Low,-1,-1,Medium,-1,Medium,High,-1,High,
T,chess,7,Isolated,TU,T,Low,-1,-1,Low,-1,-1,Medium,-1,Medium,High,-1,High,
N,chess,8,Inner,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,1,Inner,TNU,N,-1,Low,-1,-1,-1,-1,Low,Low,High,Medium,-1,Medium,
N,chess,2,Inner,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,Low,Low,Medium,
N,chess,3,Inner,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,-1,Medium,Medium,
T,chess,4,Inner,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,Low,Low,Medium,
N,chess,5,Leaf,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,-1,Medium,Medium,
这是我用来应用随机森林的代码:
import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectFromModel
from sklearn.model_selection import train_test_split
# Feature Scaling
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
X_train={}
X_test={}
y_train={}
y_test={}
dataset = pd.read_csv( 'dataExtended2.txt', sep= ',')
#convert T into 1 and N into 0
dataset['gold'] = dataset['gold'].astype('category').cat.codes
dataset['Program'] = dataset['Program'].astype('category').cat.codes
dataset['MethodType'] = dataset['MethodType'].astype('category').cat.codes
dataset['Top'] = dataset['Top'].astype('category').cat.codes
dataset['Side'] = dataset['Side'].astype('category').cat.codes
dataset['CallersT'] = dataset['CallersT'].astype('category').cat.codes
dataset['CallersN'] = dataset['CallersN'].astype('category').cat.codes
dataset['CallersU'] = dataset['CallersU'].astype('category').cat.codes
dataset['CallersCallersT'] = dataset['CallersCallersT'].astype('category').cat.codes
dataset['CallersCallersN'] = dataset['CallersCallersN'].astype('category').cat.codes
dataset['CallersCallersU'] = dataset['CallersCallersU'].astype('category').cat.codes
dataset['CalleesT'] = dataset['CalleesT'].astype('category').cat.codes
dataset['CalleesN'] = dataset['CalleesN'].astype('category').cat.codes
dataset['CalleesU'] = dataset['CalleesU'].astype('category').cat.codes
dataset['CalleesCalleesT'] = dataset['CalleesCalleesT'].astype('category').cat.codes
dataset['CalleesCalleesN'] = dataset['CalleesCalleesN'].astype('category').cat.codes
dataset['CalleesCalleesU'] = dataset['CalleesCalleesU'].astype('category').cat.codes
pd.set_option('display.max_columns', None)
print(dataset.head())
row_count, column_count = dataset.shape
X = dataset.iloc[:, 1:column_count].values
y = dataset.iloc[:, 0].values
Xcol = dataset.iloc[:, 1:column_count]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
我在执行代码的最后一行 (X_train = sc.fit_transform(X_train)
) 时收到错误:ValueError: could not convert string to float: 'Isolated'
尽管我正在使用代码行:dataset['MethodType'] = dataset['MethodType'].astype('category').cat.codes
转换 MethodType
从字符串到浮点数。我该如何解决这个问题?
这是错误的回溯:
Traceback (most recent call last):
File "<ipython-input-38-d7fe5c294c10>", line 1, in <module>
runfile('C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python/RandomForestSimplified.py', wdir='C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python')
File "C:\Users\mouna\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
execfile(filename, namespace)
File "C:\Users\mouna\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python/RandomForestSimplified.py", line 43, in <module>
X_train = sc.fit_transform(X_train)
File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\base.py", line 517, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py", line 590, in fit
return self.partial_fit(X, y)
File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py", line 612, in partial_fit
warn_on_dtype=True, estimator=self, dtype=FLOAT_DTYPES)
File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 433, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'Isolated'
Ok 当您查看代码 (print(dataset.head())) 的输出时,您会看到第一列 'gold' 但这仍然是一个字符串。发生这种情况是因为 pandas 使用第一列作为索引。
gold Program Requirement MethodType Top Side CallersT CallersN \
T 0 0 Inner 2 1 1 0 0
N 0 1 Inner 0 0 0 1 0
N 0 2 Inner 0 0 0 1 0
N 0 3 Root 0 0 0 1 0
N 0 4 Inner 0 0 0 1 0
CallersU CallersCallersT CallersCallersN CallersCallersU CalleesT \
T 1 0 0 1 0
N 0 1 0 0 1
N 0 1 0 0 1
N 0 1 0 0 1
N 0 1 0 0 1
CalleesN CalleesU CalleesCalleesT CalleesCalleesN CalleesCalleesU
T 0 0 0 1 -1
N 0 0 0 1 -1
N 0 0 0 1 -1
N 0 0 0 1 -1
N 0 0 0 1 -1
解决方案:
dataset = pd.read_csv( 'dataExtended2.txt', sep= ',', index_col=False)
那么输出将是:
gold Program Requirement MethodType Top Side CallersT CallersN \
0 1 0 1 0 2 1 1 0
1 0 0 2 0 0 0 0 1
2 0 0 3 0 0 0 0 1
3 0 0 4 3 0 0 0 1
4 0 0 5 0 0 0 0 1
CallersU CallersCallersT CallersCallersN CallersCallersU CalleesT \
0 0 1 0 0 1
1 0 0 1 0 0
2 0 0 1 0 0
3 0 0 1 0 0
4 0 0 1 0 0
CalleesN CalleesU CalleesCalleesT CalleesCalleesN CalleesCalleesU
0 0 0 0 0 1
1 1 0 0 0 1
2 1 0 0 0 1
3 1 0 0 0 1
4 1 0 0 0 1
pandashttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
的 csv 导入文档中有更多详细信息