在特征和目标中都使用向量的机器学习
Machine learning with vectors in both features and target
如何训练具有 vectors/arrays 作为特征的模型?执行此操作时我似乎总是出错...
我的特征矩阵看起来像这样:
A B C Profile
0 1 4 4 [1,2,3,4]
1 2 4 5 [2,2,4,1]
虽然我的目标向量看起来像这样:
0 [0,4,5,0]
1 [1,5,6,0]
etc 等但是我在使用 sklearn 的 linear_regression 时遇到了 fit(x, y) 的问题。这是 print(x) 和 print(y) 的输出:
x:
Beams/Beam[0]/Parameters/Energy Beams/Beam[0]/Parameters/BunchPopulation Beams/Beam[0]/BunchShape/Parameters/LongitudinalSigmaLabFrame Simulation/NumberOfParticles initialXHist
0 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
1 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
2 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
3 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
4 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
... ... ... ... ... ...
995 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
996 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
997 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
998 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
999 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
1000 rows × 5 columns
y:
0 [8, 4, 6, 13, 5, 5, 10, 11, 15, 9, 19, 18, 16,...
1 [6, 5, 8, 8, 9, 12, 6, 20, 9, 20, 18, 12, 24, ...
2 [6, 6, 7, 8, 13, 10, 12, 7, 14, 14, 18, 24, 16...
3 [2, 5, 10, 3, 6, 8, 13, 12, 7, 18, 12, 20, 22,...
4 [5, 3, 5, 9, 8, 8, 8, 9, 14, 13, 10, 15, 21, 1...
...
995 [2, 9, 4, 5, 10, 5, 10, 15, 16, 13, 12, 13, 21...
996 [2, 3, 5, 5, 11, 15, 18, 15, 14, 13, 16, 17, 1...
997 [4, 5, 6, 8, 5, 7, 7, 26, 13, 16, 17, 16, 17, ...
998 [1, 3, 5, 7, 5, 6, 16, 10, 17, 12, 12, 18, 24,...
999 [3, 4, 8, 9, 8, 4, 14, 17, 11, 16, 7, 20, 14, ...
Name: finalXHist, Length: 1000, dtype: object
有人可以指教吗?我得到的错误是:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
/tmp/ipykernel_826/1502489859.py in <module>
3
4 # Train the model using the training sets
----> 5 regr.fit(X_train, y_train)
6
7 # Make predictions using the testing set
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/linear_model/_base.py in fit(self, X, y, sample_weight)
516 accept_sparse = False if self.positive else ['csr', 'csc', 'coo']
517
--> 518 X, y = self._validate_data(X, y, accept_sparse=accept_sparse,
519 y_numeric=True, multi_output=True)
520
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
431 y = check_array(y, **check_y_params)
432 else:
--> 433 X, y = check_X_y(X, y, **check_params)
434 out = X, y
435
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
869 raise ValueError("y cannot be None")
870
--> 871 X = check_array(X, accept_sparse=accept_sparse,
872 accept_large_sparse=accept_large_sparse,
873 dtype=dtype, order=order, copy=copy,
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
671 array = array.astype(dtype, casting="unsafe", copy=False)
672 else:
--> 673 array = np.asarray(array, order=order, dtype=dtype)
674 except ComplexWarning as complex_warning:
675 raise ValueError("Complex data not supported\n"
ValueError: setting an array element with a sequence.
我尝试用谷歌搜索它,但到目前为止没有成功,我猜这两个对象的设置方式有问题。
X
(回溯的倒数第三部分)出现错误:您不能具有数组值特征。您需要进行一些特征工程以生成平坦的 table 数据进行训练;这是将数组展平为单个特征,还是基于这些数组提取一些统计信息,或者其他取决于这些数组的含义(对于 datascience.SE 或 stats.SE 来说是一个更好的问题)。
y
的数组可能有类似的问题,但如果将它们视为单独的输出是您所追求的,它就会变成“多输出”回归或“多标签”分类,它们是由 sklearn 估计器的子集处理。
如何训练具有 vectors/arrays 作为特征的模型?执行此操作时我似乎总是出错...
我的特征矩阵看起来像这样:
A B C Profile
0 1 4 4 [1,2,3,4]
1 2 4 5 [2,2,4,1]
虽然我的目标向量看起来像这样:
0 [0,4,5,0]
1 [1,5,6,0]
etc 等但是我在使用 sklearn 的 linear_regression 时遇到了 fit(x, y) 的问题。这是 print(x) 和 print(y) 的输出:
x:
Beams/Beam[0]/Parameters/Energy Beams/Beam[0]/Parameters/BunchPopulation Beams/Beam[0]/BunchShape/Parameters/LongitudinalSigmaLabFrame Simulation/NumberOfParticles initialXHist
0 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
1 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
2 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
3 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
4 25.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
... ... ... ... ... ...
995 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
996 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
997 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
998 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
999 26.0 1.300000e+11 1.05 5000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
1000 rows × 5 columns
y:
0 [8, 4, 6, 13, 5, 5, 10, 11, 15, 9, 19, 18, 16,...
1 [6, 5, 8, 8, 9, 12, 6, 20, 9, 20, 18, 12, 24, ...
2 [6, 6, 7, 8, 13, 10, 12, 7, 14, 14, 18, 24, 16...
3 [2, 5, 10, 3, 6, 8, 13, 12, 7, 18, 12, 20, 22,...
4 [5, 3, 5, 9, 8, 8, 8, 9, 14, 13, 10, 15, 21, 1...
...
995 [2, 9, 4, 5, 10, 5, 10, 15, 16, 13, 12, 13, 21...
996 [2, 3, 5, 5, 11, 15, 18, 15, 14, 13, 16, 17, 1...
997 [4, 5, 6, 8, 5, 7, 7, 26, 13, 16, 17, 16, 17, ...
998 [1, 3, 5, 7, 5, 6, 16, 10, 17, 12, 12, 18, 24,...
999 [3, 4, 8, 9, 8, 4, 14, 17, 11, 16, 7, 20, 14, ...
Name: finalXHist, Length: 1000, dtype: object
有人可以指教吗?我得到的错误是:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
/tmp/ipykernel_826/1502489859.py in <module>
3
4 # Train the model using the training sets
----> 5 regr.fit(X_train, y_train)
6
7 # Make predictions using the testing set
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/linear_model/_base.py in fit(self, X, y, sample_weight)
516 accept_sparse = False if self.positive else ['csr', 'csc', 'coo']
517
--> 518 X, y = self._validate_data(X, y, accept_sparse=accept_sparse,
519 y_numeric=True, multi_output=True)
520
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
431 y = check_array(y, **check_y_params)
432 else:
--> 433 X, y = check_X_y(X, y, **check_params)
434 out = X, y
435
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
869 raise ValueError("y cannot be None")
870
--> 871 X = check_array(X, accept_sparse=accept_sparse,
872 accept_large_sparse=accept_large_sparse,
873 dtype=dtype, order=order, copy=copy,
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
/cvmfs/sft.cern.ch/lcg/views/LCG_101swan/x86_64-centos7-gcc8-opt/lib/python3.9/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
671 array = array.astype(dtype, casting="unsafe", copy=False)
672 else:
--> 673 array = np.asarray(array, order=order, dtype=dtype)
674 except ComplexWarning as complex_warning:
675 raise ValueError("Complex data not supported\n"
ValueError: setting an array element with a sequence.
我尝试用谷歌搜索它,但到目前为止没有成功,我猜这两个对象的设置方式有问题。
X
(回溯的倒数第三部分)出现错误:您不能具有数组值特征。您需要进行一些特征工程以生成平坦的 table 数据进行训练;这是将数组展平为单个特征,还是基于这些数组提取一些统计信息,或者其他取决于这些数组的含义(对于 datascience.SE 或 stats.SE 来说是一个更好的问题)。
y
的数组可能有类似的问题,但如果将它们视为单独的输出是您所追求的,它就会变成“多输出”回归或“多标签”分类,它们是由 sklearn 估计器的子集处理。