Scikit Learn: Randomized Logistic Regression gives ValueError: output array is read-only
Scikit Learn: Randomized Logistic Regression gives ValueError: output array is read-only
我尝试用我的数据拟合随机逻辑回归,但我无法继续。
这是代码:
import numpy as np
X = np.load("X.npy")
y = np.load("y.npy")
randomized_LR = RandomizedLogisticRegression(C=0.1, verbose=True, n_jobs=3)
randomized_LR.fit(X, y)
这给出了一个错误:
344 if issparse(X):
345 size = len(weights)
346 weight_dia = sparse.dia_matrix((1 - weights, 0), (size, size))
347 X = X * weight_dia
348 else:
--> 349 X *= (1 - weights)
350
351 C = np.atleast_1d(np.asarray(C, dtype=np.float))
352 scores = np.zeros((X.shape[1], len(C)), dtype=np.bool)
353
ValueError: output array is read-only
有人可以指出我应该怎么做才能继续吗?
非常感谢你
亨德拉
按要求完成追溯:
Traceback (most recent call last):
File "temp.py", line 88, in <module>
train_randomized_logistic_regression()
File "temp.py", line 82, in train_randomized_logistic_regression
randomized_LR.fit(X, y)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 110, in fit
sample_fraction=self.sample_fraction, **params)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py", line 281, in __call__
return self.func(*args, **kwargs)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 52, in _resample_model
for _ in range(n_resampling)):
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 660, in __call__
self.retrieve()
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 543, in retrieve
raise exception_type(report)
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in <module>()
83
84
85
86 if __name__ == '__main__':
87
---> 88 train_randomized_logistic_regression()
89
90
91
92
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in train_randomized_logistic_regression()
77 X = np.load( 'data/issuemakers/features/new_X.npy')
78 y = np.load( 'data/issuemakers/features/new_y.npy')
79
80 randomized_LR = RandomizedLogisticRegression(C=0.1, n_jobs=32)
81
---> 82 randomized_LR.fit(X, y)
randomized_LR.fit = <bound method RandomizedLogisticRegression.fit o...d=0.25,
tol=0.001, verbose=False)>
X = array([[ 1.01014900e+06, 7.29970000e+04, 2....460000e+04, 3.11428571e+01, 1.88100000e+03]])
y = array([1, 1, 1, ..., 0, 1, 1])
83
84
85
86 if __name__ == '__main__':
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in fit(self=RandomizedLogisticRegression(C=0.1, fit_intercep...ld=0.25,
tol=0.001, verbose=False), X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]))
105 )(
106 estimator_func, X, y,
107 scaling=self.scaling, n_resampling=self.n_resampling,
108 n_jobs=self.n_jobs, verbose=self.verbose,
109 pre_dispatch=self.pre_dispatch, random_state=self.random_state,
--> 110 sample_fraction=self.sample_fraction, **params)
self.sample_fraction = 0.75
params = {'C': 0.1, 'fit_intercept': True, 'tol': 0.001}
111
112 if scores_.ndim == 1:
113 scores_ = scores_[:, np.newaxis]
114 self.all_scores_ = scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py in __call__(self=NotMemorizedFunc(func=<function _resample_model at 0x7fb5d7d12b18>), *args=(<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1])), **kwargs={'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False})
276 # Should be a light as possible (for speed)
277 def __init__(self, func):
278 self.func = func
279
280 def __call__(self, *args, **kwargs):
--> 281 return self.func(*args, **kwargs)
self.func = <function _resample_model>
args = (<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1]))
kwargs = {'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False}
282
283 def call_and_shelve(self, *args, **kwargs):
284 return NotMemorizedResult(self.func(*args, **kwargs))
285
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in _resample_model(estimator_func=<function _randomized_logistic>, X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), scaling=0.5, n_resampling=200, n_jobs=32, verbose=False, pre_dispatch='3*n_jobs', random_state=<mtrand.RandomState object>, sample_fraction=0.75, **params={'C': 0.1, 'fit_intercept': True, 'tol': 0.001})
47 X, y, weights=scaling * random_state.random_integers(
48 0, 1, size=(n_features,)),
49 mask=(random_state.rand(n_samples) < sample_fraction),
50 verbose=max(0, verbose - 1),
51 **params)
---> 52 for _ in range(n_resampling)):
n_resampling = 200
53 scores_ += active_set
54
55 scores_ /= n_resampling
56 return scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=32), iterable=<itertools.islice object>)
655 if pre_dispatch == "all" or n_jobs == 1:
656 # The iterable was consumed all at once by the above for loop.
657 # No need to wait for async callbacks to trigger to
658 # consumption.
659 self._iterating = False
--> 660 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=32)>
661 # Make sure that we get a last message telling us we are done
662 elapsed_time = time.time() - self._start_time
663 self._print('Done %3i out of %3i | elapsed: %s finished',
664 (len(self._output),
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError Fri Jan 2 12:13:54 2015
PID: 126664 Python 2.7.8: /home/hbunyam1/anaconda/bin/python
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.pyc in _randomized_logistic(X=memmap([[ 6.93135506e-04, 8.93676615e-04, -1...234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), weights=array([ 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. ,... 0. , 0. , 0.5, 0. , 0. , 0. , 0. , 0.5]), mask=array([ True, True, True, ..., True, True, True], dtype=bool), C=0.1, verbose=0, fit_intercept=True, tol=0.001)
344 if issparse(X):
345 size = len(weights)
346 weight_dia = sparse.dia_matrix((1 - weights, 0), (size, size))
347 X = X * weight_dia
348 else:
--> 349 X *= (1 - weights)
350
351 C = np.atleast_1d(np.asarray(C, dtype=np.float))
352 scores = np.zeros((X.shape[1], len(C)), dtype=np.bool)
353
ValueError: output array is read-only
___________________________________________________________________________
Traceback (most recent call last):
File "temp.py", line 88, in <module>
train_randomized_logistic_regression()
File "temp.py", line 82, in train_randomized_logistic_regression
randomized_LR.fit(X, y)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 110, in fit
sample_fraction=self.sample_fraction, **params)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py", line 281, in __call__
return self.func(*args, **kwargs)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 52, in _resample_model
for _ in range(n_resampling)):
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 660, in __call__
self.retrieve()
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 543, in retrieve
raise exception_type(report)
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in <module>()
83
84
85
86 if __name__ == '__main__':
87
---> 88 train_randomized_logistic_regression()
89
90
91
92
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in train_randomized_logistic_regression()
77 X = np.load( 'data/issuemakers/features/new_X.npy')
78 y = np.load( 'data/issuemakers/features/new_y.npy')
79
80 randomized_LR = RandomizedLogisticRegression(C=0.1, n_jobs=32)
81
---> 82 randomized_LR.fit(X, y)
randomized_LR.fit = <bound method RandomizedLogisticRegression.fit o...d=0.25,
tol=0.001, verbose=False)>
X = array([[ 1.01014900e+06, 7.29970000e+04, 2....460000e+04, 3.11428571e+01, 1.88100000e+03]])
y = array([1, 1, 1, ..., 0, 1, 1])
83
84
85
86 if __name__ == '__main__':
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in fit(self=RandomizedLogisticRegression(C=0.1, fit_intercep...ld=0.25,
tol=0.001, verbose=False), X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]))
105 )(
106 estimator_func, X, y,
107 scaling=self.scaling, n_resampling=self.n_resampling,
108 n_jobs=self.n_jobs, verbose=self.verbose,
109 pre_dispatch=self.pre_dispatch, random_state=self.random_state,
--> 110 sample_fraction=self.sample_fraction, **params)
self.sample_fraction = 0.75
params = {'C': 0.1, 'fit_intercept': True, 'tol': 0.001}
111
112 if scores_.ndim == 1:
113 scores_ = scores_[:, np.newaxis]
114 self.all_scores_ = scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py in __call__(self=NotMemorizedFunc(func=<function _resample_model at 0x7fb5d7d12b18>), *args=(<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1])), **kwargs={'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False})
276 # Should be a light as possible (for speed)
277 def __init__(self, func):
278 self.func = func
279
280 def __call__(self, *args, **kwargs):
--> 281 return self.func(*args, **kwargs)
self.func = <function _resample_model>
args = (<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1]))
kwargs = {'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False}
282
283 def call_and_shelve(self, *args, **kwargs):
284 return NotMemorizedResult(self.func(*args, **kwargs))
285
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in _resample_model(estimator_func=<function _randomized_logistic>, X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), scaling=0.5, n_resampling=200, n_jobs=32, verbose=False, pre_dispatch='3*n_jobs', random_state=<mtrand.RandomState object>, sample_fraction=0.75, **params={'C': 0.1, 'fit_intercept': True, 'tol': 0.001})
47 X, y, weights=scaling * random_state.random_integers(
48 0, 1, size=(n_features,)),
49 mask=(random_state.rand(n_samples) < sample_fraction),
50 verbose=max(0, verbose - 1),
51 **params)
---> 52 for _ in range(n_resampling)):
n_resampling = 200
53 scores_ += active_set
54
55 scores_ /= n_resampling
56 return scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=32), iterable=<itertools.islice object>)
655 if pre_dispatch == "all" or n_jobs == 1:
656 # The iterable was consumed all at once by the above for loop.
657 # No need to wait for async callbacks to trigger to
658 # consumption.
659 self._iterating = False
--> 660 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=32)>
661 # Make sure that we get a last message telling us we are done
662 elapsed_time = time.time() - self._start_time
663 self._print('Done %3i out of %3i | elapsed: %s finished',
664 (len(self._output),
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError Fri Jan 2 12:13:54 2015
PID: 126664 Python 2.7.8: /home/hbunyam1/anaconda/bin/python
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.pyc in _randomized_logistic(X=memmap([[ 6.93135506e-04, 8.93676615e-04, -1...234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), weights=array([ 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. ,... 0. , 0. , 0.5, 0. , 0. , 0. , 0. , 0.5]), mask=array([ True, True, True, ..., True, True, True], dtype=bool), C=0.1, verbose=0, fit_intercept=True, tol=0.001)
344 if issparse(X):
345 size = len(weights)
346 weight_dia = sparse.dia_matrix((1 - weights, 0), (size, size))
347 X = X * weight_dia
348 else:
--> 349 X *= (1 - weights)
350
351 C = np.atleast_1d(np.asarray(C, dtype=np.float))
352 scores = np.zeros((X.shape[1], len(C)), dtype=np.bool)
353
ValueError: output array is read-only
___________________________________________________________________________
[hbunyam1@zookst20 social_graph]$ python temp.py
Traceback (most recent call last):
File "temp.py", line 88, in <module>
train_randomized_logistic_regression()
File "temp.py", line 82, in train_randomized_logistic_regression
randomized_LR.fit(X, y)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 110, in fit
sample_fraction=self.sample_fraction, **params)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py", line 281, in __call__
return self.func(*args, **kwargs)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 52, in _resample_model
for _ in range(n_resampling)):
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 660, in __call__
self.retrieve()
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 543, in retrieve
raise exception_type(report)
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in <module>()
83
84
85
86 if __name__ == '__main__':
87
---> 88 train_randomized_logistic_regression()
89
90
91
92
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in train_randomized_logistic_regression()
77 X = np.load( 'data/issuemakers/features/new_X.npy', mmap_mode='r+')
78 y = np.load( 'data/issuemakers/features/new_y.npy', mmap_mode='r+')
79
80 randomized_LR = RandomizedLogisticRegression(C=0.1, n_jobs=32)
81
---> 82 randomized_LR.fit(X, y)
randomized_LR.fit = <bound method RandomizedLogisticRegression.fit o...d=0.25,
tol=0.001, verbose=False)>
X = memmap([[ 1.01014900e+06, 7.29970000e+04, 2...460000e+04, 3.11428571e+01, 1.88100000e+03]])
y = memmap([1, 1, 1, ..., 0, 1, 1])
83
84
85
86 if __name__ == '__main__':
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in fit(self=RandomizedLogisticRegression(C=0.1, fit_intercep...ld=0.25,
tol=0.001, verbose=False), X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]))
105 )(
106 estimator_func, X, y,
107 scaling=self.scaling, n_resampling=self.n_resampling,
108 n_jobs=self.n_jobs, verbose=self.verbose,
109 pre_dispatch=self.pre_dispatch, random_state=self.random_state,
--> 110 sample_fraction=self.sample_fraction, **params)
self.sample_fraction = 0.75
params = {'C': 0.1, 'fit_intercept': True, 'tol': 0.001}
111
112 if scores_.ndim == 1:
113 scores_ = scores_[:, np.newaxis]
114 self.all_scores_ = scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py in __call__(self=NotMemorizedFunc(func=<function _resample_model at 0x7f192c829b18>), *args=(<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1])), **kwargs={'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False})
276 # Should be a light as possible (for speed)
277 def __init__(self, func):
278 self.func = func
279
280 def __call__(self, *args, **kwargs):
--> 281 return self.func(*args, **kwargs)
self.func = <function _resample_model>
args = (<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1]))
kwargs = {'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False}
282
283 def call_and_shelve(self, *args, **kwargs):
284 return NotMemorizedResult(self.func(*args, **kwargs))
285
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in _resample_model(estimator_func=<function _randomized_logistic>, X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), scaling=0.5, n_resampling=200, n_jobs=32, verbose=False, pre_dispatch='3*n_jobs', random_state=<mtrand.RandomState object>, sample_fraction=0.75, **params={'C': 0.1, 'fit_intercept': True, 'tol': 0.001})
47 X, y, weights=scaling * random_state.random_integers(
48 0, 1, size=(n_features,)),
49 mask=(random_state.rand(n_samples) < sample_fraction),
50 verbose=max(0, verbose - 1),
51 **params)
---> 52 for _ in range(n_resampling)):
n_resampling = 200
53 scores_ += active_set
54
55 scores_ /= n_resampling
56 return scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=32), iterable=<itertools.islice object>)
655 if pre_dispatch == "all" or n_jobs == 1:
656 # The iterable was consumed all at once by the above for loop.
657 # No need to wait for async callbacks to trigger to
658 # consumption.
659 self._iterating = False
--> 660 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=32)>
661 # Make sure that we get a last message telling us we are done
662 elapsed_time = time.time() - self._start_time
663 self._print('Done %3i out of %3i | elapsed: %s finished',
664 (len(self._output),
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError Fri Jan 2 12:57:25 2015
PID: 127177 Python 2.7.8: /home/hbunyam1/anaconda/bin/python
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.pyc in _randomized_logistic(X=memmap([[ 6.93135506e-04, 8.93676615e-04, -1...234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=memmap([1, 1, 1, ..., 0, 0, 1]), weights=array([ 0.5, 0.5, 0. , 0.5, 0.5, 0.5, 0.5,... 0. , 0.5, 0. , 0. , 0.5, 0.5, 0.5, 0.5]), mask=array([ True, True, True, ..., False, False, True], dtype=bool), C=0.1, verbose=0, fit_intercept=True, tol=0.001)
344 if issparse(X):
345 size = len(weights)
346 weight_dia = sparse.dia_matrix((1 - weights, 0), (size, size))
347 X = X * weight_dia
348 else:
--> 349 X *= (1 - weights)
350
351 C = np.atleast_1d(np.asarray(C, dtype=np.float))
352 scores = np.zeros((X.shape[1], len(C)), dtype=np.bool)
353
ValueError: output array is read-only
___________________________________________________________________________
根据 numpy.load
的 documentation,您可能必须使用 np.load('X.npy', mmap_mode='r+')
。
尝试更改作业数量,一开始可以更改为 1 个。当 运行 RandomizedLogisticRegression 与 n_jobs=20(在功能强大的机器上)时,我 运行 陷入同样的错误。但是,当 n_jobs 设置为默认值 1.
时,代码 运行 没有任何问题
当 运行 32 处理器 Ubuntu 服务器上的函数时,我收到了同样的错误。虽然问题在 n_jobs 值大于 1 时仍然存在,但在将 n_jobs 值设置为默认值即 1 时问题消失了。[如 benbo 所述]
这是 RandomizedLogisticRegression 中的一个错误,内存中对同一对象块的多次访问会阻止彼此访问它。
请参阅 sklearn github 页面,他们深入解决了这个问题和可能的修复:https://github.com/scikit-learn/scikit-learn/issues/4597
原因是当您设置 n_jobs>1 时,Scikit-learn 在内部使用的 Joblib 库的并行调用中的 max_nbytes 参数,默认为 1M。这个参数的定义是:
Threshold on the size of arrays passed to the workers that triggers
automated memory mapping in temp_folder.
可以在此处找到更多详细信息:https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html#
因此,一旦数组超过 1M 的大小,joblib 将抛出错误 ValueError: assignment destination is read-only
。这个错误很容易重现。我们看下面的代码:
import numpy as np
from sklearn.linear_model import RandomizedLogisticRegression
# Create some random data
samples = 2621
X = np.random.randint(1,100, size=(samples, 50))
y = np.random.randint(100,200, size=(samples))
randomized_LR = RandomizedLogisticRegression(C=0.1, verbose=True, n_jobs=3)
randomized_LR.fit(X, y)
这将 运行 没有任何问题,如果我们通过使用 print(X.nbytes/1024**2)
查看 X 的大小,这将向我们显示 X 数组为 0.9998321533203125 兆字节,因此 不太大.
如果我们再次 运行 相同的代码,但将样本数更改为 2622:
import numpy as np
from sklearn.linear_model import RandomizedLogisticRegression
samples = 2622
X = np.random.randint(1,100, size=(samples, 50))
print(X.nbytes/1024**2)
y = np.random.randint(100,200, size=(samples))
randomized_LR = RandomizedLogisticRegression(C=0.1, verbose=True, n_jobs=3)
randomized_LR.fit(X, y)
Python 与 ValueError: output array is read-only
一起崩溃,检查 X 数组的大小会告诉我们它是 1.000213623046875 兆字节,因此 太大。
我尝试用我的数据拟合随机逻辑回归,但我无法继续。 这是代码:
import numpy as np
X = np.load("X.npy")
y = np.load("y.npy")
randomized_LR = RandomizedLogisticRegression(C=0.1, verbose=True, n_jobs=3)
randomized_LR.fit(X, y)
这给出了一个错误:
344 if issparse(X):
345 size = len(weights)
346 weight_dia = sparse.dia_matrix((1 - weights, 0), (size, size))
347 X = X * weight_dia
348 else:
--> 349 X *= (1 - weights)
350
351 C = np.atleast_1d(np.asarray(C, dtype=np.float))
352 scores = np.zeros((X.shape[1], len(C)), dtype=np.bool)
353
ValueError: output array is read-only
有人可以指出我应该怎么做才能继续吗?
非常感谢你
亨德拉
按要求完成追溯:
Traceback (most recent call last):
File "temp.py", line 88, in <module>
train_randomized_logistic_regression()
File "temp.py", line 82, in train_randomized_logistic_regression
randomized_LR.fit(X, y)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 110, in fit
sample_fraction=self.sample_fraction, **params)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py", line 281, in __call__
return self.func(*args, **kwargs)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 52, in _resample_model
for _ in range(n_resampling)):
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 660, in __call__
self.retrieve()
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 543, in retrieve
raise exception_type(report)
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in <module>()
83
84
85
86 if __name__ == '__main__':
87
---> 88 train_randomized_logistic_regression()
89
90
91
92
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in train_randomized_logistic_regression()
77 X = np.load( 'data/issuemakers/features/new_X.npy')
78 y = np.load( 'data/issuemakers/features/new_y.npy')
79
80 randomized_LR = RandomizedLogisticRegression(C=0.1, n_jobs=32)
81
---> 82 randomized_LR.fit(X, y)
randomized_LR.fit = <bound method RandomizedLogisticRegression.fit o...d=0.25,
tol=0.001, verbose=False)>
X = array([[ 1.01014900e+06, 7.29970000e+04, 2....460000e+04, 3.11428571e+01, 1.88100000e+03]])
y = array([1, 1, 1, ..., 0, 1, 1])
83
84
85
86 if __name__ == '__main__':
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in fit(self=RandomizedLogisticRegression(C=0.1, fit_intercep...ld=0.25,
tol=0.001, verbose=False), X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]))
105 )(
106 estimator_func, X, y,
107 scaling=self.scaling, n_resampling=self.n_resampling,
108 n_jobs=self.n_jobs, verbose=self.verbose,
109 pre_dispatch=self.pre_dispatch, random_state=self.random_state,
--> 110 sample_fraction=self.sample_fraction, **params)
self.sample_fraction = 0.75
params = {'C': 0.1, 'fit_intercept': True, 'tol': 0.001}
111
112 if scores_.ndim == 1:
113 scores_ = scores_[:, np.newaxis]
114 self.all_scores_ = scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py in __call__(self=NotMemorizedFunc(func=<function _resample_model at 0x7fb5d7d12b18>), *args=(<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1])), **kwargs={'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False})
276 # Should be a light as possible (for speed)
277 def __init__(self, func):
278 self.func = func
279
280 def __call__(self, *args, **kwargs):
--> 281 return self.func(*args, **kwargs)
self.func = <function _resample_model>
args = (<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1]))
kwargs = {'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False}
282
283 def call_and_shelve(self, *args, **kwargs):
284 return NotMemorizedResult(self.func(*args, **kwargs))
285
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in _resample_model(estimator_func=<function _randomized_logistic>, X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), scaling=0.5, n_resampling=200, n_jobs=32, verbose=False, pre_dispatch='3*n_jobs', random_state=<mtrand.RandomState object>, sample_fraction=0.75, **params={'C': 0.1, 'fit_intercept': True, 'tol': 0.001})
47 X, y, weights=scaling * random_state.random_integers(
48 0, 1, size=(n_features,)),
49 mask=(random_state.rand(n_samples) < sample_fraction),
50 verbose=max(0, verbose - 1),
51 **params)
---> 52 for _ in range(n_resampling)):
n_resampling = 200
53 scores_ += active_set
54
55 scores_ /= n_resampling
56 return scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=32), iterable=<itertools.islice object>)
655 if pre_dispatch == "all" or n_jobs == 1:
656 # The iterable was consumed all at once by the above for loop.
657 # No need to wait for async callbacks to trigger to
658 # consumption.
659 self._iterating = False
--> 660 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=32)>
661 # Make sure that we get a last message telling us we are done
662 elapsed_time = time.time() - self._start_time
663 self._print('Done %3i out of %3i | elapsed: %s finished',
664 (len(self._output),
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError Fri Jan 2 12:13:54 2015
PID: 126664 Python 2.7.8: /home/hbunyam1/anaconda/bin/python
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.pyc in _randomized_logistic(X=memmap([[ 6.93135506e-04, 8.93676615e-04, -1...234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), weights=array([ 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. ,... 0. , 0. , 0.5, 0. , 0. , 0. , 0. , 0.5]), mask=array([ True, True, True, ..., True, True, True], dtype=bool), C=0.1, verbose=0, fit_intercept=True, tol=0.001)
344 if issparse(X):
345 size = len(weights)
346 weight_dia = sparse.dia_matrix((1 - weights, 0), (size, size))
347 X = X * weight_dia
348 else:
--> 349 X *= (1 - weights)
350
351 C = np.atleast_1d(np.asarray(C, dtype=np.float))
352 scores = np.zeros((X.shape[1], len(C)), dtype=np.bool)
353
ValueError: output array is read-only
___________________________________________________________________________
Traceback (most recent call last):
File "temp.py", line 88, in <module>
train_randomized_logistic_regression()
File "temp.py", line 82, in train_randomized_logistic_regression
randomized_LR.fit(X, y)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 110, in fit
sample_fraction=self.sample_fraction, **params)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py", line 281, in __call__
return self.func(*args, **kwargs)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 52, in _resample_model
for _ in range(n_resampling)):
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 660, in __call__
self.retrieve()
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 543, in retrieve
raise exception_type(report)
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in <module>()
83
84
85
86 if __name__ == '__main__':
87
---> 88 train_randomized_logistic_regression()
89
90
91
92
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in train_randomized_logistic_regression()
77 X = np.load( 'data/issuemakers/features/new_X.npy')
78 y = np.load( 'data/issuemakers/features/new_y.npy')
79
80 randomized_LR = RandomizedLogisticRegression(C=0.1, n_jobs=32)
81
---> 82 randomized_LR.fit(X, y)
randomized_LR.fit = <bound method RandomizedLogisticRegression.fit o...d=0.25,
tol=0.001, verbose=False)>
X = array([[ 1.01014900e+06, 7.29970000e+04, 2....460000e+04, 3.11428571e+01, 1.88100000e+03]])
y = array([1, 1, 1, ..., 0, 1, 1])
83
84
85
86 if __name__ == '__main__':
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in fit(self=RandomizedLogisticRegression(C=0.1, fit_intercep...ld=0.25,
tol=0.001, verbose=False), X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]))
105 )(
106 estimator_func, X, y,
107 scaling=self.scaling, n_resampling=self.n_resampling,
108 n_jobs=self.n_jobs, verbose=self.verbose,
109 pre_dispatch=self.pre_dispatch, random_state=self.random_state,
--> 110 sample_fraction=self.sample_fraction, **params)
self.sample_fraction = 0.75
params = {'C': 0.1, 'fit_intercept': True, 'tol': 0.001}
111
112 if scores_.ndim == 1:
113 scores_ = scores_[:, np.newaxis]
114 self.all_scores_ = scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py in __call__(self=NotMemorizedFunc(func=<function _resample_model at 0x7fb5d7d12b18>), *args=(<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1])), **kwargs={'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False})
276 # Should be a light as possible (for speed)
277 def __init__(self, func):
278 self.func = func
279
280 def __call__(self, *args, **kwargs):
--> 281 return self.func(*args, **kwargs)
self.func = <function _resample_model>
args = (<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1]))
kwargs = {'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False}
282
283 def call_and_shelve(self, *args, **kwargs):
284 return NotMemorizedResult(self.func(*args, **kwargs))
285
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in _resample_model(estimator_func=<function _randomized_logistic>, X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), scaling=0.5, n_resampling=200, n_jobs=32, verbose=False, pre_dispatch='3*n_jobs', random_state=<mtrand.RandomState object>, sample_fraction=0.75, **params={'C': 0.1, 'fit_intercept': True, 'tol': 0.001})
47 X, y, weights=scaling * random_state.random_integers(
48 0, 1, size=(n_features,)),
49 mask=(random_state.rand(n_samples) < sample_fraction),
50 verbose=max(0, verbose - 1),
51 **params)
---> 52 for _ in range(n_resampling)):
n_resampling = 200
53 scores_ += active_set
54
55 scores_ /= n_resampling
56 return scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=32), iterable=<itertools.islice object>)
655 if pre_dispatch == "all" or n_jobs == 1:
656 # The iterable was consumed all at once by the above for loop.
657 # No need to wait for async callbacks to trigger to
658 # consumption.
659 self._iterating = False
--> 660 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=32)>
661 # Make sure that we get a last message telling us we are done
662 elapsed_time = time.time() - self._start_time
663 self._print('Done %3i out of %3i | elapsed: %s finished',
664 (len(self._output),
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError Fri Jan 2 12:13:54 2015
PID: 126664 Python 2.7.8: /home/hbunyam1/anaconda/bin/python
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.pyc in _randomized_logistic(X=memmap([[ 6.93135506e-04, 8.93676615e-04, -1...234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), weights=array([ 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. ,... 0. , 0. , 0.5, 0. , 0. , 0. , 0. , 0.5]), mask=array([ True, True, True, ..., True, True, True], dtype=bool), C=0.1, verbose=0, fit_intercept=True, tol=0.001)
344 if issparse(X):
345 size = len(weights)
346 weight_dia = sparse.dia_matrix((1 - weights, 0), (size, size))
347 X = X * weight_dia
348 else:
--> 349 X *= (1 - weights)
350
351 C = np.atleast_1d(np.asarray(C, dtype=np.float))
352 scores = np.zeros((X.shape[1], len(C)), dtype=np.bool)
353
ValueError: output array is read-only
___________________________________________________________________________
[hbunyam1@zookst20 social_graph]$ python temp.py
Traceback (most recent call last):
File "temp.py", line 88, in <module>
train_randomized_logistic_regression()
File "temp.py", line 82, in train_randomized_logistic_regression
randomized_LR.fit(X, y)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 110, in fit
sample_fraction=self.sample_fraction, **params)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py", line 281, in __call__
return self.func(*args, **kwargs)
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py", line 52, in _resample_model
for _ in range(n_resampling)):
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 660, in __call__
self.retrieve()
File "/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 543, in retrieve
raise exception_type(report)
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in <module>()
83
84
85
86 if __name__ == '__main__':
87
---> 88 train_randomized_logistic_regression()
89
90
91
92
...........................................................................
/zfs/ilps-plexest/homedirs/hbunyam1/social_graph/temp.py in train_randomized_logistic_regression()
77 X = np.load( 'data/issuemakers/features/new_X.npy', mmap_mode='r+')
78 y = np.load( 'data/issuemakers/features/new_y.npy', mmap_mode='r+')
79
80 randomized_LR = RandomizedLogisticRegression(C=0.1, n_jobs=32)
81
---> 82 randomized_LR.fit(X, y)
randomized_LR.fit = <bound method RandomizedLogisticRegression.fit o...d=0.25,
tol=0.001, verbose=False)>
X = memmap([[ 1.01014900e+06, 7.29970000e+04, 2...460000e+04, 3.11428571e+01, 1.88100000e+03]])
y = memmap([1, 1, 1, ..., 0, 1, 1])
83
84
85
86 if __name__ == '__main__':
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in fit(self=RandomizedLogisticRegression(C=0.1, fit_intercep...ld=0.25,
tol=0.001, verbose=False), X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]))
105 )(
106 estimator_func, X, y,
107 scaling=self.scaling, n_resampling=self.n_resampling,
108 n_jobs=self.n_jobs, verbose=self.verbose,
109 pre_dispatch=self.pre_dispatch, random_state=self.random_state,
--> 110 sample_fraction=self.sample_fraction, **params)
self.sample_fraction = 0.75
params = {'C': 0.1, 'fit_intercept': True, 'tol': 0.001}
111
112 if scores_.ndim == 1:
113 scores_ = scores_[:, np.newaxis]
114 self.all_scores_ = scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py in __call__(self=NotMemorizedFunc(func=<function _resample_model at 0x7f192c829b18>), *args=(<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1])), **kwargs={'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False})
276 # Should be a light as possible (for speed)
277 def __init__(self, func):
278 self.func = func
279
280 def __call__(self, *args, **kwargs):
--> 281 return self.func(*args, **kwargs)
self.func = <function _resample_model>
args = (<function _randomized_logistic>, array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), array([1, 1, 1, ..., 0, 1, 1]))
kwargs = {'C': 0.1, 'fit_intercept': True, 'n_jobs': 32, 'n_resampling': 200, 'pre_dispatch': '3*n_jobs', 'random_state': None, 'sample_fraction': 0.75, 'scaling': 0.5, 'tol': 0.001, 'verbose': False}
282
283 def call_and_shelve(self, *args, **kwargs):
284 return NotMemorizedResult(self.func(*args, **kwargs))
285
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.py in _resample_model(estimator_func=<function _randomized_logistic>, X=array([[ 6.93135506e-04, 8.93676615e-04, -1....234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=array([1, 1, 1, ..., 0, 1, 1]), scaling=0.5, n_resampling=200, n_jobs=32, verbose=False, pre_dispatch='3*n_jobs', random_state=<mtrand.RandomState object>, sample_fraction=0.75, **params={'C': 0.1, 'fit_intercept': True, 'tol': 0.001})
47 X, y, weights=scaling * random_state.random_integers(
48 0, 1, size=(n_features,)),
49 mask=(random_state.rand(n_samples) < sample_fraction),
50 verbose=max(0, verbose - 1),
51 **params)
---> 52 for _ in range(n_resampling)):
n_resampling = 200
53 scores_ += active_set
54
55 scores_ /= n_resampling
56 return scores_
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self=Parallel(n_jobs=32), iterable=<itertools.islice object>)
655 if pre_dispatch == "all" or n_jobs == 1:
656 # The iterable was consumed all at once by the above for loop.
657 # No need to wait for async callbacks to trigger to
658 # consumption.
659 self._iterating = False
--> 660 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=32)>
661 # Make sure that we get a last message telling us we are done
662 elapsed_time = time.time() - self._start_time
663 self._print('Done %3i out of %3i | elapsed: %s finished',
664 (len(self._output),
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError Fri Jan 2 12:57:25 2015
PID: 127177 Python 2.7.8: /home/hbunyam1/anaconda/bin/python
...........................................................................
/home/hbunyam1/anaconda/lib/python2.7/site-packages/sklearn/linear_model/randomized_l1.pyc in _randomized_logistic(X=memmap([[ 6.93135506e-04, 8.93676615e-04, -1...234095e-04, -1.19037488e-04, 4.20921021e-04]]), y=memmap([1, 1, 1, ..., 0, 0, 1]), weights=array([ 0.5, 0.5, 0. , 0.5, 0.5, 0.5, 0.5,... 0. , 0.5, 0. , 0. , 0.5, 0.5, 0.5, 0.5]), mask=array([ True, True, True, ..., False, False, True], dtype=bool), C=0.1, verbose=0, fit_intercept=True, tol=0.001)
344 if issparse(X):
345 size = len(weights)
346 weight_dia = sparse.dia_matrix((1 - weights, 0), (size, size))
347 X = X * weight_dia
348 else:
--> 349 X *= (1 - weights)
350
351 C = np.atleast_1d(np.asarray(C, dtype=np.float))
352 scores = np.zeros((X.shape[1], len(C)), dtype=np.bool)
353
ValueError: output array is read-only
___________________________________________________________________________
根据 numpy.load
的 documentation,您可能必须使用 np.load('X.npy', mmap_mode='r+')
。
尝试更改作业数量,一开始可以更改为 1 个。当 运行 RandomizedLogisticRegression 与 n_jobs=20(在功能强大的机器上)时,我 运行 陷入同样的错误。但是,当 n_jobs 设置为默认值 1.
时,代码 运行 没有任何问题当 运行 32 处理器 Ubuntu 服务器上的函数时,我收到了同样的错误。虽然问题在 n_jobs 值大于 1 时仍然存在,但在将 n_jobs 值设置为默认值即 1 时问题消失了。[如 benbo 所述]
这是 RandomizedLogisticRegression 中的一个错误,内存中对同一对象块的多次访问会阻止彼此访问它。
请参阅 sklearn github 页面,他们深入解决了这个问题和可能的修复:https://github.com/scikit-learn/scikit-learn/issues/4597
原因是当您设置 n_jobs>1 时,Scikit-learn 在内部使用的 Joblib 库的并行调用中的 max_nbytes 参数,默认为 1M。这个参数的定义是:
Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder.
可以在此处找到更多详细信息:https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html#
因此,一旦数组超过 1M 的大小,joblib 将抛出错误 ValueError: assignment destination is read-only
。这个错误很容易重现。我们看下面的代码:
import numpy as np
from sklearn.linear_model import RandomizedLogisticRegression
# Create some random data
samples = 2621
X = np.random.randint(1,100, size=(samples, 50))
y = np.random.randint(100,200, size=(samples))
randomized_LR = RandomizedLogisticRegression(C=0.1, verbose=True, n_jobs=3)
randomized_LR.fit(X, y)
这将 运行 没有任何问题,如果我们通过使用 print(X.nbytes/1024**2)
查看 X 的大小,这将向我们显示 X 数组为 0.9998321533203125 兆字节,因此 不太大.
如果我们再次 运行 相同的代码,但将样本数更改为 2622:
import numpy as np
from sklearn.linear_model import RandomizedLogisticRegression
samples = 2622
X = np.random.randint(1,100, size=(samples, 50))
print(X.nbytes/1024**2)
y = np.random.randint(100,200, size=(samples))
randomized_LR = RandomizedLogisticRegression(C=0.1, verbose=True, n_jobs=3)
randomized_LR.fit(X, y)
Python 与 ValueError: output array is read-only
一起崩溃,检查 X 数组的大小会告诉我们它是 1.000213623046875 兆字节,因此 太大。