为什么随机种子不能使结果在 Python 中保持不变
Why random seed does not make results constant in Python
我使用以下代码。我想为相同的随机种子获得相同的结果。我使用相同的随机种子(在本例中为 1)并得到不同的结果。
这是代码:
import pandas as pd
import numpy as np
from random import seed
# Load scikit's random forest classifier library
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
seed(1) ### <-----
file_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data'
dataset2 = pd.read_csv(file_path, header=None, sep=',')
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
#Encoding
y = le.fit_transform(dataset2[60])
dataset2[60] = y
train, test = train_test_split(dataset2, test_size=0.1)
y = train[60]
y_test = test[60]
clf = RandomForestClassifier(n_jobs=100, random_state=0)
features = train.columns[0:59]
clf.fit(train[features], y)
# Apply the Classifier we trained to the test data
y_pred = clf.predict(test[features])
# Decode
y_test_label = le.inverse_transform(y_test)
y_pred_label = le.inverse_transform(y_pred)
from sklearn.metrics import accuracy_score
print (accuracy_score(y_test_label, y_pred_label))
# Two following results:
# 0.761904761905
# 0.90476190476
您的代码:
import numpy as np
from random import seed
seed(1) ### <-----
设置python的random-class的随机种子。
但是sklearn完全基于numpy的random class, as explained here:
For testing and replicability, it is often important to have the entire execution controlled by a single seed for the pseudo-random number generator used in algorithms that have a randomized component. Scikit-learn does not use its own global random state; whenever a RandomState instance or an integer random seed is not provided as an argument, it relies on the numpy global random state, which can be set using numpy.random.seed. For example, to set an execution’s numpy global random state to 42, one could execute the following in his or her script:
import numpy as np
np.random.seed(42)
所以一般来说你应该这样做:
np.random.seed(1)
但这只是事实的一部分,因为在小心使用所有 sklearn 组件时通常不需要这样做,明确地用一些种子调用它们!
如 ShreyasG 所述,这也适用于 train_test_split
我使用以下代码。我想为相同的随机种子获得相同的结果。我使用相同的随机种子(在本例中为 1)并得到不同的结果。 这是代码:
import pandas as pd
import numpy as np
from random import seed
# Load scikit's random forest classifier library
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
seed(1) ### <-----
file_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data'
dataset2 = pd.read_csv(file_path, header=None, sep=',')
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
#Encoding
y = le.fit_transform(dataset2[60])
dataset2[60] = y
train, test = train_test_split(dataset2, test_size=0.1)
y = train[60]
y_test = test[60]
clf = RandomForestClassifier(n_jobs=100, random_state=0)
features = train.columns[0:59]
clf.fit(train[features], y)
# Apply the Classifier we trained to the test data
y_pred = clf.predict(test[features])
# Decode
y_test_label = le.inverse_transform(y_test)
y_pred_label = le.inverse_transform(y_pred)
from sklearn.metrics import accuracy_score
print (accuracy_score(y_test_label, y_pred_label))
# Two following results:
# 0.761904761905
# 0.90476190476
您的代码:
import numpy as np
from random import seed
seed(1) ### <-----
设置python的random-class的随机种子。
但是sklearn完全基于numpy的random class, as explained here:
For testing and replicability, it is often important to have the entire execution controlled by a single seed for the pseudo-random number generator used in algorithms that have a randomized component. Scikit-learn does not use its own global random state; whenever a RandomState instance or an integer random seed is not provided as an argument, it relies on the numpy global random state, which can be set using numpy.random.seed. For example, to set an execution’s numpy global random state to 42, one could execute the following in his or her script:
import numpy as np
np.random.seed(42)
所以一般来说你应该这样做:
np.random.seed(1)
但这只是事实的一部分,因为在小心使用所有 sklearn 组件时通常不需要这样做,明确地用一些种子调用它们!
如 ShreyasG 所述,这也适用于 train_test_split