ValueError: X has 1 features, but SVC is expecting 3 features as input
ValueError: X has 1 features, but SVC is expecting 3 features as input
我正在尝试使用 Keras 和 sklearn 创建一个股票价格预测器(不要实际用它来投资,别担心),它从中获取任何时间序列Kaggle 并检查“关闭”。然后它需要一个特定长度的滚动时间 window 并预测方向精度,向上 (1) 或向下 (0)。
在尝试 运行 下面的代码时,出现了以下错误:
File "...", line 71, in test
y_pred = self.model.predict(self.X_test)
ValueError: X has 1 features, but SVC is expecting 3 features as input.
有人可以指导我解决可能的问题吗? SVC 期望我可能缺少哪些功能?
代码:Model.py
create_features
根据滚动时间 window 检查市场是更低还是更高,并设置 X 和 y:
#window_size = the set size of the rolling time window
def create_features(data, window_size):
X = []
y = []
for i in range(0, len(data.index) - window_size):
temp = [data.iloc[i + j]['Close'] for j in range(0, window_size)]
avg = sum(temp) / len(temp)
X.append(temp)
y.append(0 if data.iloc[i + window_size]['Close'] < avg else 1)
return X, y
class Model:
def __init__(self, market: Market, training_percent: float, window_size: int):
self.model = SVC(C=10, gamma='scale', kernel='rbf')
X, y = create_features(market.data, window_size)
self.X_train, self.y_train, self.X_test, self.y_test = train_test_split(X, y, shuffle=False, stratify=None, train_size=training_percent)
self.X_train = np.array(self.X_train)
self.y_train = np.array(self.y_test)
#self.X_test = np.array(self.X_test).reshape(-1, 1)
def train(self):
self.model.fit(self.X_train, self.y_train)
def test(self):
y_pred = self.model.predict(self.X_test) #THE COMPLAINING LINE
y_pred = [0 if i < 0.5 else 1 for i in y_pred]
tn, fp, fn, tp = confusion_matrix(self.y_test, y_pred, labels=[0, 1]).ravel()
print(tn, fp, fn, tp)
print("Accuracy:", (tn + fp) / (tn + fp + fn + tp))
def predict(self, input_array):
return self.model.predict(input_array)
以上称为:
model_test = Model(markets[m], training_testing[j], window_size[i])
model_test.train()
model_test.test()
如能就此问题提供任何帮助,我们将不胜感激。提前谢谢你。
问题在于您如何获得 train_test_split
的输出。正如 documentation 所述,您应该按顺序获取拆分数据集:
# Notice the order of the unpacking.
self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(X, y, shuffle=False, stratify=None, train_size=training_percent)
因此,测试数据集的形状不同,因为它实际上是训练标签。您也不需要 .reshape
。
此外,不确定您是否要这样做:
# Assigning y_test to y_train.
self.y_train = np.array(self.y_test)
我正在尝试使用 Keras 和 sklearn 创建一个股票价格预测器(不要实际用它来投资,别担心),它从中获取任何时间序列Kaggle 并检查“关闭”。然后它需要一个特定长度的滚动时间 window 并预测方向精度,向上 (1) 或向下 (0)。
在尝试 运行 下面的代码时,出现了以下错误:
File "...", line 71, in test
y_pred = self.model.predict(self.X_test)
ValueError: X has 1 features, but SVC is expecting 3 features as input.
有人可以指导我解决可能的问题吗? SVC 期望我可能缺少哪些功能?
代码:Model.py
create_features
根据滚动时间 window 检查市场是更低还是更高,并设置 X 和 y:
#window_size = the set size of the rolling time window
def create_features(data, window_size):
X = []
y = []
for i in range(0, len(data.index) - window_size):
temp = [data.iloc[i + j]['Close'] for j in range(0, window_size)]
avg = sum(temp) / len(temp)
X.append(temp)
y.append(0 if data.iloc[i + window_size]['Close'] < avg else 1)
return X, y
class Model:
def __init__(self, market: Market, training_percent: float, window_size: int):
self.model = SVC(C=10, gamma='scale', kernel='rbf')
X, y = create_features(market.data, window_size)
self.X_train, self.y_train, self.X_test, self.y_test = train_test_split(X, y, shuffle=False, stratify=None, train_size=training_percent)
self.X_train = np.array(self.X_train)
self.y_train = np.array(self.y_test)
#self.X_test = np.array(self.X_test).reshape(-1, 1)
def train(self):
self.model.fit(self.X_train, self.y_train)
def test(self):
y_pred = self.model.predict(self.X_test) #THE COMPLAINING LINE
y_pred = [0 if i < 0.5 else 1 for i in y_pred]
tn, fp, fn, tp = confusion_matrix(self.y_test, y_pred, labels=[0, 1]).ravel()
print(tn, fp, fn, tp)
print("Accuracy:", (tn + fp) / (tn + fp + fn + tp))
def predict(self, input_array):
return self.model.predict(input_array)
以上称为:
model_test = Model(markets[m], training_testing[j], window_size[i])
model_test.train()
model_test.test()
如能就此问题提供任何帮助,我们将不胜感激。提前谢谢你。
问题在于您如何获得 train_test_split
的输出。正如 documentation 所述,您应该按顺序获取拆分数据集:
# Notice the order of the unpacking.
self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(X, y, shuffle=False, stratify=None, train_size=training_percent)
因此,测试数据集的形状不同,因为它实际上是训练标签。您也不需要 .reshape
。
此外,不确定您是否要这样做:
# Assigning y_test to y_train.
self.y_train = np.array(self.y_test)