sklearn:无法训练具有大数据帧的决策树
sk-learn: cannot train decision tree with big dataframes
我正在使用 Python 和 sk-learn 进行我的第一个项目。
在项目中,我必须根据可用数据进行预测。
为此,我想使用 DesicionTreeClassifier。
我确实加载并清理了数据并开始生成几棵树。
在生成过程中,一些数据集确实无法生成树,而另一些则可以。
当我仔细观察时,我发现可以训练树的数据集少于 30 行,每行 9 列。树似乎不能超过4.
Traceback (most recent call last):
File "/usr/local/bin/decisionTree/readAnPrepareData.py", line 57, in <module>
trainForest()
File "/usr/local/bin/decisionTree/readAnPrepareData.py", line 39, in trainForest
model.fit(X_train, Y)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 524, in fit
X_argsorted=X_argsorted)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 340, in build
recursive_partition(X, X_argsorted, y, sample_mask, 0, -1, False)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 272, in recursive_partition
min_samples_leaf, max_features, criterion, random_state)
File "_tree.pyx", line 533, in sklearn.tree._tree._find_best_split (sklearn/tree/_tree.c:4812)
ValueError: ndarray is not Fortran contiguous
我是这样创建树的:
model = DecisionTreeClassifier()
model.fit(X_train, Y)
这可能是什么原因造成的?这可能是因为溢出?那将是非常奇怪的想法,因为这只是这么少的数据...
Numpy 运行 版本:1.9.2
scikit-学习“0.16.1”
我正在使用 Python 和 sk-learn 进行我的第一个项目。 在项目中,我必须根据可用数据进行预测。 为此,我想使用 DesicionTreeClassifier。 我确实加载并清理了数据并开始生成几棵树。 在生成过程中,一些数据集确实无法生成树,而另一些则可以。 当我仔细观察时,我发现可以训练树的数据集少于 30 行,每行 9 列。树似乎不能超过4.
Traceback (most recent call last):
File "/usr/local/bin/decisionTree/readAnPrepareData.py", line 57, in <module>
trainForest()
File "/usr/local/bin/decisionTree/readAnPrepareData.py", line 39, in trainForest
model.fit(X_train, Y)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 524, in fit
X_argsorted=X_argsorted)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 340, in build
recursive_partition(X, X_argsorted, y, sample_mask, 0, -1, False)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 272, in recursive_partition
min_samples_leaf, max_features, criterion, random_state)
File "_tree.pyx", line 533, in sklearn.tree._tree._find_best_split (sklearn/tree/_tree.c:4812)
ValueError: ndarray is not Fortran contiguous
我是这样创建树的:
model = DecisionTreeClassifier()
model.fit(X_train, Y)
这可能是什么原因造成的?这可能是因为溢出?那将是非常奇怪的想法,因为这只是这么少的数据...
Numpy 运行 版本:1.9.2 scikit-学习“0.16.1”