拆分包含多个标签的数据集
Split dataset containing multiple labels
我有一个带有多个标签的数据集,即对于每个 X 我有 2 个 y,我需要分成训练集和测试集。
我尝试使用 sklearn 函数 train_test_split():
import numpy as np
from sklearn.model_selection import train_test_split
X = np.random.randn(10)
y1 = np.random.randint(1,10,10)
y2 = np.random.randint(1,3,10)
X_train, X_test, [Y1_train, Y2_train], [Y1_test, Y2_test] = train_test_split(X, [y1, y2], test_size=0.4, random_state=42)
但我收到一条错误消息:
ValueError: Found input variables with inconsistent numbers of samples: [10, 2]
此代码应该适合您。
import numpy as np
from sklearn.model_selection import train_test_split
X = np.random.randn(10)
y1 = np.random.randint(1,10,10)
y2 = np.random.randint(1,3,10)
y = [[y1[i],y2[i]] for i in range(len(y1))]
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.4, random_state=42)
它将产生以下输出
print(X_train)
[ 0.42534237 1.35471168 0.00640736 1.34057234 0.50608562 -1.73341641]
和
print(Y_train)
[[3, 1], [7, 1], [6, 2], [4, 2], [6, 2], [2, 2]]
在您的代码中,标签数组的形状为 (2,10),但输入数组的形状为 (10,)。
print([y1,y2])
[array([2, 3, 7, 6, 4, 9, 2, 3, 6, 6]), array([2, 2, 1, 2, 2, 2, 2, 1, 1, 2])]
print(np.array([y1,y2]).shape)
(2, 10)
print(X.shape)
(10,)
但是您想要的标签形状是 (10,2):
print(y)
[[2, 2], [3, 2], [7, 1], [6, 2], [4, 2], [9, 2], [2, 2], [3, 1], [6, 1], [6, 2]]
print(np.array(y).shape)
(10, 2)
输入和输出必须具有相同的形状。
我有一个带有多个标签的数据集,即对于每个 X 我有 2 个 y,我需要分成训练集和测试集。
我尝试使用 sklearn 函数 train_test_split():
import numpy as np
from sklearn.model_selection import train_test_split
X = np.random.randn(10)
y1 = np.random.randint(1,10,10)
y2 = np.random.randint(1,3,10)
X_train, X_test, [Y1_train, Y2_train], [Y1_test, Y2_test] = train_test_split(X, [y1, y2], test_size=0.4, random_state=42)
但我收到一条错误消息:
ValueError: Found input variables with inconsistent numbers of samples: [10, 2]
此代码应该适合您。
import numpy as np
from sklearn.model_selection import train_test_split
X = np.random.randn(10)
y1 = np.random.randint(1,10,10)
y2 = np.random.randint(1,3,10)
y = [[y1[i],y2[i]] for i in range(len(y1))]
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.4, random_state=42)
它将产生以下输出
print(X_train)
[ 0.42534237 1.35471168 0.00640736 1.34057234 0.50608562 -1.73341641]
和
print(Y_train)
[[3, 1], [7, 1], [6, 2], [4, 2], [6, 2], [2, 2]]
在您的代码中,标签数组的形状为 (2,10),但输入数组的形状为 (10,)。
print([y1,y2])
[array([2, 3, 7, 6, 4, 9, 2, 3, 6, 6]), array([2, 2, 1, 2, 2, 2, 2, 1, 1, 2])]
print(np.array([y1,y2]).shape)
(2, 10)
print(X.shape)
(10,)
但是您想要的标签形状是 (10,2):
print(y)
[[2, 2], [3, 2], [7, 1], [6, 2], [4, 2], [9, 2], [2, 2], [3, 1], [6, 1], [6, 2]]
print(np.array(y).shape)
(10, 2)
输入和输出必须具有相同的形状。