不使用 Standard Scaler 时的内存错误

Question

我在此处阅读了 https://towardsdatascience.com/do-decision-trees-need-feature-scaling-97809eaa60c6 and watch https://www.youtube.com/watch?v=nmBqnKSSKfM&ab_channel=KrishNaik 视频，其中说明您不需要使用 Standard Scaler 进行决策树机器学习。

但是，在我的代码中发生的情况恰恰相反。这是我尝试运行.

的代码

# importing libraries  
import numpy as nm  
import matplotlib.pyplot as mpl
import pandas as pd  
  
#importing datasets  
data_set= pd.read_csv('Social_Network_Ads.csv')  
  
#Extracting Independent and dependent Variable  
x= data_set.iloc[:, [2,3]].values  
y= data_set.iloc[:, 4].values  
  
# Splitting the dataset into training and test set.  
from sklearn.model_selection import train_test_split  
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)  
  
#feature Scaling  

from sklearn.preprocessing import StandardScaler    
st_x= StandardScaler()  
x_train= st_x.fit_transform(x_train)    
x_test= st_x.transform(x_test)  


#Fitting Decision Tree classifier to the training set  
from sklearn.tree import DecisionTreeClassifier  
classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)  
classifier.fit(x_train, y_train)

我在尝试可视化数据的部分继续提问。这是代码。

#Visulaizing the trianing set result  
from matplotlib.colors import ListedColormap  
x_set,y_set = x_train, y_train
x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step  =0.01),  
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))  
mpl.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),  
alpha = 0.75, cmap = ListedColormap(('purple','green' )))  
mpl.xlim(x1.min(), x1.max())  
mpl.ylim(x2.min(), x2.max())  
for i, j in enumerate(nm.unique(y_set)):  
    mpl.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], c = ListedColormap(('purple', 'green'))(i), label = j)  
mpl.title('Decision Tree Algorithm (Training set)')  
mpl.xlabel('Age')  
mpl.ylabel('Estimated Salary')  
mpl.legend()  
mpl.show()

如果我运行使用 StandardScaler 输出成功。该图显示得很好。但是，当我散列（评论）StandardScaler 部分时，它指出了 内存错误。

MemoryError Traceback (most recent call last) <ipython-input-8-1282bf709e27> in <module> 3 x_set,y_set = x_train, y_train 4 x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01), ----> 5 nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01)) 6 mpl.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape), 7 alpha = 0.75, cmap = ListedColormap(('purple','green' ))) ~\Anaconda3\lib\site-packages\numpy\lib\function_base.py in meshgrid(*xi, **kwargs) 4209 4210 if copy_: -> 4211 output = [x.copy() for x in output] 4212 4213 return output ~\Anaconda3\lib\site-packages\numpy\lib\function_base.py in <listcomp>(.0) 4209 4210 if copy_: -> 4211 output = [x.copy() for x in output] 4212 4213 return output MemoryError:

错误只出现在可视化部分；在代码的另一部分，这种预测在没有标准缩放器的情况下也能很好地工作。

决策树可以在没有标准缩放器的情况下工作吗？如果是，我该如何解决？

Answer 1

决策树可以在没有标准缩放器和标准缩放器的情况下工作。这里需要注意的重要一点是缩放数据不会影响决策树模型的性能。

如果你之后绘制数据，虽然我想你不想绘制缩放数据而是原始数据；因此你的问题。

我能想到的最简单的解决方案是将 sparse=True 作为参数传递给 numpy.meshgrid，因为这似乎是在回溯中引发错误的原因。在过去的问题中有一些细节 here.

所以应用于你的问题，这意味着你改变这一行：

nm.meshgrid(
    nm.arange(start=x_set[:, 0].min() - 1, stop=x_set[:, 0].max() + 1, step=0.01),  
    nm.arange(start=x_set[:, 1].min() - 1, stop=x_set[:, 1].max() + 1, step=0.01),
)

至

nm.meshgrid(
    nm.arange(start=x_set[:, 0].min() - 1, stop=x_set[:, 0].max() + 1, step=0.01),  
    nm.arange(start=x_set[:, 1].min() - 1, stop=x_set[:, 1].max() + 1, step=0.01),
    sparse=True,
)

Answer 2

我想我已经找到解决这个问题的方法了。我只是让 Standard Scaler 保留在它写的地方。由于缩放数据，缩放数据在可视化部分工作（导致数据按比例缩小绘制在图表上）

否则如果我想使用非比例数据，我可以写

classifier.(x,y)

我使用非比例数据的原因是创建一个代码，可以预测来自下面机器学习的任何输入，

# this will work well if NOT using Standard Scaler
classifier.fit(x, y) 
estimated_salary = input("Enter your salary:")
age = input("Enter your age:")


purchase = classifier.predict([[estimated_salary, age]])

print("If your salary is", estimated_salary, "and your age is", age , "this is your purchase result:", purchase)

感谢那些启发并提出一些想法的人。我很感激。

不使用 Standard Scaler 时的内存错误

Memory error if not using Standard Scaler

python

machine-learning

decision-tree

scikit-learn