I am getting a ValueError: "ValueError: Number of labels=16512 does not match number of samples=16339"
I am getting a ValueError: "ValueError: Number of labels=16512 does not match number of samples=16339"
我正在试验机器学习,我是新手,所以我不知道为什么会收到此错误:
ValueError: Number of labels=16512 does not match number of samples=16339
我搜索了一下,没有任何帮助。有人可以帮我吗?我不知道为什么会这样,我认为我做的一切都是对的。我正在尝试用这个来预测房价。
from sklearn.tree import DecisionTreeClassifier
import numpy as np
from sklearn.model_selection import train_test_split
train = pd.read_csv('housing.csv')
X = train.drop(columns=["median_house_value", "ocean_proximity"])
y = train["median_house_value"]
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2)
model = DecisionTreeClassifier()
X_train = X_train.dropna()
y_train = y_train.dropna()
model.fit(X_train, y_train)
这是我的错误信息:
ValueError Traceback (most recent call last)
<ipython-input-43-4691a6b66d80> in <module>
17 y_train = y_train.dropna()
18
---> 19 model.fit(X_train, y_train)
c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
888 """
889
--> 890 super().fit(
891 X, y,
892 sample_weight=sample_weight,
c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
270
271 if len(y) != n_samples:
--> 272 raise ValueError("Number of labels=%d does not match "
273 "number of samples=%d" % (len(y), n_samples))
274 if not 0 <= self.min_weight_fraction_leaf <= 0.5:
ValueError: Number of labels=16512 does not match number of samples=16339```
你能试试下面的方法吗?我对这种方法没有问题:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import numpy as np
from sklearn.model_selection import train_test_split
data = pd.read_csv('housing.csv')
prices = data['median_house_value']
features = data.drop(['median_house_value', 'ocean_proximity'], axis = 1)
prices.shape
(20640,)
features.shape
(20640, 8)
X_train, X_test, y_train, y_test = train_test_split(features, prices, test_size=0.2, random_state=42)
X_train = X_train.dropna()
y_train = y_train.dropna()
y_train.shape
(16512,)
X_train.shape
(16512, 8)
model.fit(X_train, y_train)
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=None, splitter='best')
我正在试验机器学习,我是新手,所以我不知道为什么会收到此错误:
ValueError: Number of labels=16512 does not match number of samples=16339
我搜索了一下,没有任何帮助。有人可以帮我吗?我不知道为什么会这样,我认为我做的一切都是对的。我正在尝试用这个来预测房价。
from sklearn.tree import DecisionTreeClassifier
import numpy as np
from sklearn.model_selection import train_test_split
train = pd.read_csv('housing.csv')
X = train.drop(columns=["median_house_value", "ocean_proximity"])
y = train["median_house_value"]
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2)
model = DecisionTreeClassifier()
X_train = X_train.dropna()
y_train = y_train.dropna()
model.fit(X_train, y_train)
这是我的错误信息:
ValueError Traceback (most recent call last)
<ipython-input-43-4691a6b66d80> in <module>
17 y_train = y_train.dropna()
18
---> 19 model.fit(X_train, y_train)
c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
888 """
889
--> 890 super().fit(
891 X, y,
892 sample_weight=sample_weight,
c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
270
271 if len(y) != n_samples:
--> 272 raise ValueError("Number of labels=%d does not match "
273 "number of samples=%d" % (len(y), n_samples))
274 if not 0 <= self.min_weight_fraction_leaf <= 0.5:
ValueError: Number of labels=16512 does not match number of samples=16339```
你能试试下面的方法吗?我对这种方法没有问题:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import numpy as np
from sklearn.model_selection import train_test_split
data = pd.read_csv('housing.csv')
prices = data['median_house_value']
features = data.drop(['median_house_value', 'ocean_proximity'], axis = 1)
prices.shape
(20640,)
features.shape
(20640, 8)
X_train, X_test, y_train, y_test = train_test_split(features, prices, test_size=0.2, random_state=42)
X_train = X_train.dropna()
y_train = y_train.dropna()
y_train.shape
(16512,)
X_train.shape
(16512, 8)
model.fit(X_train, y_train)
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=None, splitter='best')