Python - 在一次生产 1 测试中使用模型的建议
Python - Suggestions on using model in production 1 test at a time
我创建了一个具有 4 个分类特征和二进制结果的人工神经网络,1 表示可疑,0 表示不可疑:
ParentPath ParentExe
0 C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe
1 C:\Program Files (x86)\Wireless AutoSwitch WrlsAutoSW.exs
2 C:\Program Files (x86)\Wireless AutoSwitch WrlsAutoSW.exs
3 C:\Windows\System32 svchost.exe
4 C:\Program Files (x86)\Wireless AutoSwitch WrlsAutoSW.exs
ChildPath ChildExe Suspicious
C:\Windows\System32 conhost.exe 0
C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe 0
C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe 0
C:\Program Files\Common Files OfficeC2RClient.exe 0
C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe 1
C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe 0
我使用 sklearn 进行标签编码,并对数据进行一种热编码:
#Import the dataset
X = DBF2.iloc[:, 0:4].values
#X = DBF2[['ParentProcess', 'ChildProcess']]
y = DBF2.iloc[:, 4].values#.ravel()
#Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
#Label Encode Parent Path
labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])
#Label Encode Parent Exe
labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])
#Label Encode Child Path
labelencoder_X_3 = LabelEncoder()
X[:, 2] = labelencoder_X_3.fit_transform(X[:, 2])
#Label Encode Child Exe
labelencoder_X_4 = LabelEncoder()
X[:, 3] = labelencoder_X_4.fit_transform(X[:, 3])
#Create dummy variables
onehotencoder = OneHotEncoder(categorical_features = [0,1,2,3])
X = onehotencoder.fit_transform(X)
我已将数据拆分为训练集和测试集,并 运行 在我的带有 nvidia 1080 的 gpu 盒子上。我已经调整了超参数,现在准备好使用在一次测试一个测试样本的生产环境。假设我只想测试一个样本:
ParentPath ParentExe ChildPath ChildExe
0 C:\Windows\Malicious badscipt.exe C:\Windows\System cmd.exe
我运行遇到的问题是训练集看到了正常的ChildPath "C:\Windows\System"和ChildExe "cmd.exe",但是训练集没有看到ParentPath "C:\Windows\Malicous" 或 ParentExe "badscipt.exe" 所以这些都没有被标记或一个热编码。我的大问题是如何处理其中一部分尚未训练的测试功能?
我看过使用特征散列的示例,但我不确定如何应用它或者是否能解决这个问题。任何帮助或指示将不胜感激。
#Create data frame with malicous test
testmalicious = {'ParentProcess': ['ParentProcess': ['C:\Windows\System32\services.exe'], 'ChildProcess': ['C:\Windows\System32\svch0st.exe'], 'Suspicous': [1]}
testmaliciousdf = pd.DataFrame(data=testmalicious)
testmaliciousdf = testmaliciousdf[['ParentProcess', 'ChildProcess', 'Suspicous']]
#Add the malicious to the end of dataframe
DBF1 = DBF2.append(testmaliciousdf)
DBF2 = DBF1.reset_index(drop=True)
#Location where mal_array sample is located - after label and one hot encoded pull out of training set
mal_array = X[368827:368828]
#Remove the last line of the array from training set
X=X[:-1]
#Remove the last line of the array from the y data
y=y[:-1]
#At the end test if suspicious or not
new_prediction = classifier.predict(sc.transform(mal_array))
new_prediction = (new_prediction > 0.5)
new_prediction
我创建了一个具有 4 个分类特征和二进制结果的人工神经网络,1 表示可疑,0 表示不可疑:
ParentPath ParentExe
0 C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe
1 C:\Program Files (x86)\Wireless AutoSwitch WrlsAutoSW.exs
2 C:\Program Files (x86)\Wireless AutoSwitch WrlsAutoSW.exs
3 C:\Windows\System32 svchost.exe
4 C:\Program Files (x86)\Wireless AutoSwitch WrlsAutoSW.exs
ChildPath ChildExe Suspicious
C:\Windows\System32 conhost.exe 0
C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe 0
C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe 0
C:\Program Files\Common Files OfficeC2RClient.exe 0
C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe 1
C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe 0
我使用 sklearn 进行标签编码,并对数据进行一种热编码:
#Import the dataset
X = DBF2.iloc[:, 0:4].values
#X = DBF2[['ParentProcess', 'ChildProcess']]
y = DBF2.iloc[:, 4].values#.ravel()
#Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
#Label Encode Parent Path
labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])
#Label Encode Parent Exe
labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])
#Label Encode Child Path
labelencoder_X_3 = LabelEncoder()
X[:, 2] = labelencoder_X_3.fit_transform(X[:, 2])
#Label Encode Child Exe
labelencoder_X_4 = LabelEncoder()
X[:, 3] = labelencoder_X_4.fit_transform(X[:, 3])
#Create dummy variables
onehotencoder = OneHotEncoder(categorical_features = [0,1,2,3])
X = onehotencoder.fit_transform(X)
我已将数据拆分为训练集和测试集,并 运行 在我的带有 nvidia 1080 的 gpu 盒子上。我已经调整了超参数,现在准备好使用在一次测试一个测试样本的生产环境。假设我只想测试一个样本:
ParentPath ParentExe ChildPath ChildExe
0 C:\Windows\Malicious badscipt.exe C:\Windows\System cmd.exe
我运行遇到的问题是训练集看到了正常的ChildPath "C:\Windows\System"和ChildExe "cmd.exe",但是训练集没有看到ParentPath "C:\Windows\Malicous" 或 ParentExe "badscipt.exe" 所以这些都没有被标记或一个热编码。我的大问题是如何处理其中一部分尚未训练的测试功能?
我看过使用特征散列的示例,但我不确定如何应用它或者是否能解决这个问题。任何帮助或指示将不胜感激。
#Create data frame with malicous test
testmalicious = {'ParentProcess': ['ParentProcess': ['C:\Windows\System32\services.exe'], 'ChildProcess': ['C:\Windows\System32\svch0st.exe'], 'Suspicous': [1]}
testmaliciousdf = pd.DataFrame(data=testmalicious)
testmaliciousdf = testmaliciousdf[['ParentProcess', 'ChildProcess', 'Suspicous']]
#Add the malicious to the end of dataframe
DBF1 = DBF2.append(testmaliciousdf)
DBF2 = DBF1.reset_index(drop=True)
#Location where mal_array sample is located - after label and one hot encoded pull out of training set
mal_array = X[368827:368828]
#Remove the last line of the array from training set
X=X[:-1]
#Remove the last line of the array from the y data
y=y[:-1]
#At the end test if suspicious or not
new_prediction = classifier.predict(sc.transform(mal_array))
new_prediction = (new_prediction > 0.5)
new_prediction