找到样本数量不一致的输入变量:RandomForestRegressor 的 [1, 4] 错误
Getting found input variables with inconsistent numbers of samples: [1, 4] error for RandomForestRegressor
我指的是 this Random Forrest Algorithm example 来预测不同阶段的拒绝。
我正在从数据库中获取 stages
和 reject_count
的值。并为 x
使用 stages
值,为 y
使用 reject_count
值。
我的代码是:
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
stages = [102, 103, 104, 106]
reject_count = [1, 3, 1, 2]
li = []
li.append(stages)
l2 = []
l2.append(reject_count)
x = np.array(li)
y = np.array(reject_count)
x.shape
y.shape
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
print("===============")
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
regressor = RandomForestRegressor(n_estimators=100, random_state=0)
print("x train", X_train)
print("y train", y_train)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
print(y_pred)
请指导我哪里做错了。
这里发生了两件事
首先你的 x 和 y 没有相同的维度,一个是列表的列表,另一个是列表。
其次,假设您希望数据作为每个样本一个观察值的数组,您应该重塑 x 值。更多关于
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
stages = [102, 103, 104, 106]
reject_count = [1, 3, 1, 2]
#li = []
#li.append(stages)
#l2 = []
#l2.append(reject_count)
x = np.array(stages).reshape(-1, 1)
y = np.array(reject_count)
print(x, y)
x.shape
y.shape
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
print("===============")
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
regressor = RandomForestRegressor(n_estimators=100, random_state=0)
print("x train", X_train)
print("y train", y_train)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
print(y_pred)
我指的是 this Random Forrest Algorithm example 来预测不同阶段的拒绝。
我正在从数据库中获取 stages
和 reject_count
的值。并为 x
使用 stages
值,为 y
使用 reject_count
值。
我的代码是:
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
stages = [102, 103, 104, 106]
reject_count = [1, 3, 1, 2]
li = []
li.append(stages)
l2 = []
l2.append(reject_count)
x = np.array(li)
y = np.array(reject_count)
x.shape
y.shape
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
print("===============")
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
regressor = RandomForestRegressor(n_estimators=100, random_state=0)
print("x train", X_train)
print("y train", y_train)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
print(y_pred)
请指导我哪里做错了。
这里发生了两件事
首先你的 x 和 y 没有相同的维度,一个是列表的列表,另一个是列表。
其次,假设您希望数据作为每个样本一个观察值的数组,您应该重塑 x 值。更多关于
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
stages = [102, 103, 104, 106]
reject_count = [1, 3, 1, 2]
#li = []
#li.append(stages)
#l2 = []
#l2.append(reject_count)
x = np.array(stages).reshape(-1, 1)
y = np.array(reject_count)
print(x, y)
x.shape
y.shape
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
print("===============")
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
regressor = RandomForestRegressor(n_estimators=100, random_state=0)
print("x train", X_train)
print("y train", y_train)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
print(y_pred)