机器学习无法预测正确的结果
Machine learning not predicting correct results
我正在创建一个简单的 python 机器学习脚本,它将根据以下参数预测贷款是否会被批准
business experience: should be greater than 7
year of founded: should be after 2015
loan: no previous or current loan
如果满足以上条件,则只批准贷款。这个数据集可以从这个link:
下载
https://drive.google.com/file/d/1QtJ3EED7KDqJDrSHxHB6g9kc5YAfTlmF/view?usp=sharing
对于以上数据,我有以下脚本
from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np
data = pd.read_csv("test2.csv")
data.head()
X = data[["Business Exp", "Year of Founded", "Previous/Current Loan"]]
Y = data["OUTPUT"]
clf = LogisticRegression()
clf.fit(X, Y)
test_x2 = np.array([[9, 2017, 0]])
Y_pred = clf.predict(test_x2)
print(Y_pred)
我正在通过test_x2
中的测试数据。测试数据是如果business exp是9,成立年份是2017,没有current/previous贷款,那么就是提供贷款。所以它应该预测并且结果应该是 1
但它显示 0。代码或数据集是否有任何问题。由于我是机器学习的新手并且仍在学习它,所以我创建了这个自定义数据集以供我自己理解。
请大家给点好的建议。谢谢
您应该在管道中使用 StandardScaler()
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import pandas as pd
import numpy as np
data = pd.read_csv("test2.csv")
data.head()
X = data[["Business Exp", "Year of Founded", "Previous/Current Loan"]]
Y = data["OUTPUT"]
clf = make_pipeline(StandardScaler(), LogisticRegression())
clf.fit(X, Y)
test_x2 = np.array([[9, 2017, 0]])
Y_pred = clf.predict(test_x2)
print("prediction = ", Y_pred.item())
prediction = 1
print("score = ", clf.score(X, Y))
score = 0.95535
我正在创建一个简单的 python 机器学习脚本,它将根据以下参数预测贷款是否会被批准
business experience: should be greater than 7
year of founded: should be after 2015
loan: no previous or current loan
如果满足以上条件,则只批准贷款。这个数据集可以从这个link:
下载https://drive.google.com/file/d/1QtJ3EED7KDqJDrSHxHB6g9kc5YAfTlmF/view?usp=sharing
对于以上数据,我有以下脚本
from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np
data = pd.read_csv("test2.csv")
data.head()
X = data[["Business Exp", "Year of Founded", "Previous/Current Loan"]]
Y = data["OUTPUT"]
clf = LogisticRegression()
clf.fit(X, Y)
test_x2 = np.array([[9, 2017, 0]])
Y_pred = clf.predict(test_x2)
print(Y_pred)
我正在通过test_x2
中的测试数据。测试数据是如果business exp是9,成立年份是2017,没有current/previous贷款,那么就是提供贷款。所以它应该预测并且结果应该是 1
但它显示 0。代码或数据集是否有任何问题。由于我是机器学习的新手并且仍在学习它,所以我创建了这个自定义数据集以供我自己理解。
请大家给点好的建议。谢谢
您应该在管道中使用 StandardScaler()
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import pandas as pd
import numpy as np
data = pd.read_csv("test2.csv")
data.head()
X = data[["Business Exp", "Year of Founded", "Previous/Current Loan"]]
Y = data["OUTPUT"]
clf = make_pipeline(StandardScaler(), LogisticRegression())
clf.fit(X, Y)
test_x2 = np.array([[9, 2017, 0]])
Y_pred = clf.predict(test_x2)
print("prediction = ", Y_pred.item())
prediction = 1
print("score = ", clf.score(X, Y))
score = 0.95535