test_train_split 的包装器,用于为任意数量的输入数组生成训练、验证和测试拆分
Wrapper for test_train_split to produce train, validation, and test splits for any number of input arrays
使用 *args 和 **kwargs 围绕 test_train_split 函数构建包装器的正确方法是什么?
为了提供更多上下文,数据科学通常需要创建一个测试-验证-训练拆分,所以我想构建一个像
这样的包装器
def train_validate_test_split(*dataframe, **options):
train, test = train_test_split(dataframe, options)
train, val = train_test_split(train, options)
return train, val, test
这给出了来自 oneliner 调用的数据集的训练、验证、测试拆分。然而,执行
train_validate_test_split(dataframe_1, test_size = 0.2)
导致灾难性的失败。我想我把 *args 和 **kwargs 搞得一团糟,但我仍然无法理解它们。任何建议将不胜感激。
函数签名是:
train_test_split(*arrays, **options)
意味着它接受任意数量的位置数组和任意数量的关键字选项。要returntrain, val, test
如你所愿,可以进行如下操作:
from sklearn.model_selection import train_test_split
df = pd.DataFrame({"x": np.random.randn(1000),"y": np.random.randn(1000)})
def train_validate_test_split(dataframe, **options):
train, test = train_test_split(dataframe, **options)
train, val = train_test_split(train, **options)
return train, val, test
a,b,c = train_validate_test_split(df, train_size=.25)
编辑
要接受 1 个或 2 个输入,请使用:
def train_val_test_split(*arrays,**options):
if len(arrays) == 1:
X_train, X_test = train_test_split(*arrays,**options)
X_train, X_val = train_test_split(X_train,**options)
print("Unpack to X_train, X_val, X_test")
return X_train, X_val, X_test
if len(arrays) == 2:
X_train, X_test, y_train, y_test = train_test_split(*arrays,**options)
X_train, X_val, y_train, y_val = train_test_split(X_train,y_train,**options)
print("Unpack to X_train, X_val, X_test, y_train, y_val, y_test")
return X_train, X_val, X_test, y_train, y_val, y_test
else:
raise ValueError("Only implemented for 1 or 2 arrays. "
f"You provided {len(arrays)} arrays")
或对于任意数量的输入数组:
y = np.random.randn(1000)
def train_val_test_split(*arrays,**options):
'''
inputs:
arrays - any number of array to split,
outputs:
sequence
arr1_train, arr2_train, ... , arr1_val , arr2_val, ..., arr1_test, arr2_test, ...
'''
*out, = train_test_split(*arrays,**options)
train = out[0::2] #x1_train, x2_train, ...
test = out[1::2] #x1_test, x2_test, ...
*train_val, = train_test_split(*train,**options)
train = train_val[0::2]
val = train_val[1::2]
print(f"Unpack to {len(arrays)*3} tuples: train,...,val,..., test...")
return tuple(split for tuple_ in zip(train,val,test) for split in tuple_)
x = train_val_test_split(y,y,y)
for item in x:
print(item.shape, end=", ")
Unpack to 9 tuples: train,...,val,..., test...
(562,), (188,), (250,), (562,), (188,), (250,), (562,), (188,), (250,),
使用 *args 和 **kwargs 围绕 test_train_split 函数构建包装器的正确方法是什么? 为了提供更多上下文,数据科学通常需要创建一个测试-验证-训练拆分,所以我想构建一个像
这样的包装器def train_validate_test_split(*dataframe, **options):
train, test = train_test_split(dataframe, options)
train, val = train_test_split(train, options)
return train, val, test
这给出了来自 oneliner 调用的数据集的训练、验证、测试拆分。然而,执行
train_validate_test_split(dataframe_1, test_size = 0.2)
导致灾难性的失败。我想我把 *args 和 **kwargs 搞得一团糟,但我仍然无法理解它们。任何建议将不胜感激。
函数签名是:
train_test_split(*arrays, **options)
意味着它接受任意数量的位置数组和任意数量的关键字选项。要returntrain, val, test
如你所愿,可以进行如下操作:
from sklearn.model_selection import train_test_split
df = pd.DataFrame({"x": np.random.randn(1000),"y": np.random.randn(1000)})
def train_validate_test_split(dataframe, **options):
train, test = train_test_split(dataframe, **options)
train, val = train_test_split(train, **options)
return train, val, test
a,b,c = train_validate_test_split(df, train_size=.25)
编辑
要接受 1 个或 2 个输入,请使用:
def train_val_test_split(*arrays,**options):
if len(arrays) == 1:
X_train, X_test = train_test_split(*arrays,**options)
X_train, X_val = train_test_split(X_train,**options)
print("Unpack to X_train, X_val, X_test")
return X_train, X_val, X_test
if len(arrays) == 2:
X_train, X_test, y_train, y_test = train_test_split(*arrays,**options)
X_train, X_val, y_train, y_val = train_test_split(X_train,y_train,**options)
print("Unpack to X_train, X_val, X_test, y_train, y_val, y_test")
return X_train, X_val, X_test, y_train, y_val, y_test
else:
raise ValueError("Only implemented for 1 or 2 arrays. "
f"You provided {len(arrays)} arrays")
或对于任意数量的输入数组:
y = np.random.randn(1000)
def train_val_test_split(*arrays,**options):
'''
inputs:
arrays - any number of array to split,
outputs:
sequence
arr1_train, arr2_train, ... , arr1_val , arr2_val, ..., arr1_test, arr2_test, ...
'''
*out, = train_test_split(*arrays,**options)
train = out[0::2] #x1_train, x2_train, ...
test = out[1::2] #x1_test, x2_test, ...
*train_val, = train_test_split(*train,**options)
train = train_val[0::2]
val = train_val[1::2]
print(f"Unpack to {len(arrays)*3} tuples: train,...,val,..., test...")
return tuple(split for tuple_ in zip(train,val,test) for split in tuple_)
x = train_val_test_split(y,y,y)
for item in x:
print(item.shape, end=", ")
Unpack to 9 tuples: train,...,val,..., test...
(562,), (188,), (250,), (562,), (188,), (250,), (562,), (188,), (250,),