为什么 random.seed( ) 在生成数据集时不起作用?
Why does random.seed( ) not work in generating dataset?
我正在创建用于测试的数据集
import random
from sklearn.datasets import make_regression
random.seed(10)
X, y = make_regression(n_samples = 1000, n_features = 10)
X[0:2]
你能解释一下为什么我在每次 运行 之后得到不同的数据集吗?例如,2 次运行 return
array([[-0.28058959, -0.00570283, 0.31728106, 0.52745066, 1.69651572,
-0.37038286, 0.67825801, -0.71782482, -0.29886242, 0.07891646],
[ 0.73872413, -0.27472164, -1.70298606, -0.59211593, 0.04060707,
1.39661574, -1.25656819, -0.79698442, -0.38533316, 0.65484856]])
和
array([[ 0.12493586, 1.01388974, 1.2390685 , -0.13797227, 0.60029193,
-1.39268898, -0.49804303, 1.31267837, 0.11774784, 0.56224193],
[ 0.47067323, 0.3845262 , 1.22959284, -0.02913909, -1.56481745,
-1.56479078, 2.04082295, -0.22561445, -0.37150552, 0.91750366]])
您需要将种子作为参数放入 make_regression
调用中:
sklearn.datasets.make_regression(n_samples=100, n_features=100, n_informative=10,
n_targets=1, bias=0.0, effective_rank=None,
tail_strength=0.5, noise=0.0, shuffle=True,
coef=False, random_state= None )
^°^°^°^°^°^°^°^°^°
见API:
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state
is the seed used by the random number generator;
If RandomState instance, random_state
is the random number generator;
If None
, the random number generator is the RandomState instance used by np.random
.
所以在你的情况下:
X, y = make_regression(n_samples = 1000, n_features = 10, random_state = 10)
尽管如前所述,在 make_regression
中设置 random_state
参数可以解决问题,但澄清 为什么 的确切原因可以说是有用的您自己的代码片段无法按预期工作。答案是,正如 docs 中所暗示的那样,make_regression
使用 Numpy 的随机数生成器 (RNG),而不是您使用的 Python random
模块代码。
因此,只需将您的代码片段稍微更改为
import numpy as np # change 1
from sklearn.datasets import make_regression
np.random.seed(10) # change 2
X, y = make_regression(n_samples = 1000, n_features = 10) # no random_state set here
X[0:2]
总是产生相同的数据集:
array([[-1.32553507, -1.34894938, -0.82160306, 0.03538905, -0.68611315,
-0.74469132, 1.37391771, 0.98675482, -0.90921643, -1.57943748],
[ 1.13660812, 0.52367005, 0.05090828, -0.47210149, -0.98592548,
-0.69677968, 0.31752274, -0.0771912 , 2.17548753, 0.75189637]])
实际上与在 make_regression
中设置 random_state=10
产生的结果相同:
X, y = make_regression(n_samples = 1000, n_features = 10, random_state=10)
X[0:2]
# result:
array([[-1.32553507, -1.34894938, -0.82160306, 0.03538905, -0.68611315,
-0.74469132, 1.37391771, 0.98675482, -0.90921643, -1.57943748],
[ 1.13660812, 0.52367005, 0.05090828, -0.47210149, -0.98592548,
-0.69677968, 0.31752274, -0.0771912 , 2.17548753, 0.75189637]])
有关 RNG 的更多信息,您可能会在 中找到有用的答案。
我正在创建用于测试的数据集
import random
from sklearn.datasets import make_regression
random.seed(10)
X, y = make_regression(n_samples = 1000, n_features = 10)
X[0:2]
你能解释一下为什么我在每次 运行 之后得到不同的数据集吗?例如,2 次运行 return
array([[-0.28058959, -0.00570283, 0.31728106, 0.52745066, 1.69651572,
-0.37038286, 0.67825801, -0.71782482, -0.29886242, 0.07891646],
[ 0.73872413, -0.27472164, -1.70298606, -0.59211593, 0.04060707,
1.39661574, -1.25656819, -0.79698442, -0.38533316, 0.65484856]])
和
array([[ 0.12493586, 1.01388974, 1.2390685 , -0.13797227, 0.60029193,
-1.39268898, -0.49804303, 1.31267837, 0.11774784, 0.56224193],
[ 0.47067323, 0.3845262 , 1.22959284, -0.02913909, -1.56481745,
-1.56479078, 2.04082295, -0.22561445, -0.37150552, 0.91750366]])
您需要将种子作为参数放入 make_regression
调用中:
sklearn.datasets.make_regression(n_samples=100, n_features=100, n_informative=10,
n_targets=1, bias=0.0, effective_rank=None,
tail_strength=0.5, noise=0.0, shuffle=True,
coef=False, random_state= None )
^°^°^°^°^°^°^°^°^°
见API:
random_state : int, RandomState instance or None, optional (default=None)
If int,
random_state
is the seed used by the random number generator; If RandomState instance,random_state
is the random number generator; IfNone
, the random number generator is the RandomState instance used bynp.random
.
所以在你的情况下:
X, y = make_regression(n_samples = 1000, n_features = 10, random_state = 10)
尽管如前所述,在 make_regression
中设置 random_state
参数可以解决问题,但澄清 为什么 的确切原因可以说是有用的您自己的代码片段无法按预期工作。答案是,正如 docs 中所暗示的那样,make_regression
使用 Numpy 的随机数生成器 (RNG),而不是您使用的 Python random
模块代码。
因此,只需将您的代码片段稍微更改为
import numpy as np # change 1
from sklearn.datasets import make_regression
np.random.seed(10) # change 2
X, y = make_regression(n_samples = 1000, n_features = 10) # no random_state set here
X[0:2]
总是产生相同的数据集:
array([[-1.32553507, -1.34894938, -0.82160306, 0.03538905, -0.68611315,
-0.74469132, 1.37391771, 0.98675482, -0.90921643, -1.57943748],
[ 1.13660812, 0.52367005, 0.05090828, -0.47210149, -0.98592548,
-0.69677968, 0.31752274, -0.0771912 , 2.17548753, 0.75189637]])
实际上与在 make_regression
中设置 random_state=10
产生的结果相同:
X, y = make_regression(n_samples = 1000, n_features = 10, random_state=10)
X[0:2]
# result:
array([[-1.32553507, -1.34894938, -0.82160306, 0.03538905, -0.68611315,
-0.74469132, 1.37391771, 0.98675482, -0.90921643, -1.57943748],
[ 1.13660812, 0.52367005, 0.05090828, -0.47210149, -0.98592548,
-0.69677968, 0.31752274, -0.0771912 , 2.17548753, 0.75189637]])
有关 RNG 的更多信息,您可能会在