编码数据集索引出现类型错误,为什么? [机器学习]
Encoding dataset index presents a type-error, why? [Machine Learning]
数据集:https://docs.google.com/spreadsheets/d/1OBdyMv8yU7EEdlUNqk_Ox9gT2LMItY2DivEiVX4fYWY/edit?usp=sharing
所以我正在尝试将机器学习应用于数据集中的统计数据,但是每次我尝试 encode/pre-process 我收到的数据都是:
TypeError: Index does not support mutable operations
预处理的重点不就是改变数值吗?这不是应用机器学习的必要前提吗?不知道如何去 encoding/preprocessing...任何建议表示赞赏。谢谢!
代码:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.preprocessing import OrdinalEncoder
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
dbdata = pd.read_excel("C:/Users/Andrew/sportsref_download.xlsx")
print(dbdata)
print(dbdata.describe())
df = dbdata.columns
print(df)
#define x&y
x = dbdata
y = dbdata.PTS
shapes = x.shape, y.shape
print(shapes)
print(dbdata.index)
print('next')
#apply logreg
logreg = LogisticRegression(solver='lbfgs')
cross_val_score(logreg, x, y, cv=2, scoring='accuracy').mean()
print(cross_val_score)
le = LabelEncoder()
df["date_tf"] = le.fit_transform(dbdata.Date)
df["tm_tf"] = le.fit_transform(df.Tm)
df["opp_tf"] = le.fit_transform(df.Opp)
OneHotEncoder().fit_transform(df[['date_tf']]).toarray()
Cols = ["Date","Tm","Opp"]
integer_encoded = OrdinalEncoder().fit_transform(x[Cols])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(x)
print(X_scaled)
ec = OneHotEncoder()
X_encoded = dbdata.apply(lambda col: ec.fit_transform(col.astype(str)), axis=0, result_type='expand')
X_encoded = ec.fit_transform(x.values.reshape(-1,1), y)
print(X_encoded)
X_encoded = ec.fit_transform(x)
错误是因为在非数值编码之前模型已经拟合。
您正在以日期格式提供模型数据,遗憾的是这不起作用。
这种性质的机器学习模型只接受数字或二进制数据。
Sklearn documentation
数据预处理 是构建模型并将其部署到生产环境中最重要的部分之一。
如果您需要帮助清理数据,请留言。我很乐意提供帮助。
否则,请参考此处:Intro article on data cleaning
数据集:https://docs.google.com/spreadsheets/d/1OBdyMv8yU7EEdlUNqk_Ox9gT2LMItY2DivEiVX4fYWY/edit?usp=sharing
所以我正在尝试将机器学习应用于数据集中的统计数据,但是每次我尝试 encode/pre-process 我收到的数据都是:
TypeError: Index does not support mutable operations
预处理的重点不就是改变数值吗?这不是应用机器学习的必要前提吗?不知道如何去 encoding/preprocessing...任何建议表示赞赏。谢谢!
代码:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.preprocessing import OrdinalEncoder
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
dbdata = pd.read_excel("C:/Users/Andrew/sportsref_download.xlsx")
print(dbdata)
print(dbdata.describe())
df = dbdata.columns
print(df)
#define x&y
x = dbdata
y = dbdata.PTS
shapes = x.shape, y.shape
print(shapes)
print(dbdata.index)
print('next')
#apply logreg
logreg = LogisticRegression(solver='lbfgs')
cross_val_score(logreg, x, y, cv=2, scoring='accuracy').mean()
print(cross_val_score)
le = LabelEncoder()
df["date_tf"] = le.fit_transform(dbdata.Date)
df["tm_tf"] = le.fit_transform(df.Tm)
df["opp_tf"] = le.fit_transform(df.Opp)
OneHotEncoder().fit_transform(df[['date_tf']]).toarray()
Cols = ["Date","Tm","Opp"]
integer_encoded = OrdinalEncoder().fit_transform(x[Cols])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(x)
print(X_scaled)
ec = OneHotEncoder()
X_encoded = dbdata.apply(lambda col: ec.fit_transform(col.astype(str)), axis=0, result_type='expand')
X_encoded = ec.fit_transform(x.values.reshape(-1,1), y)
print(X_encoded)
X_encoded = ec.fit_transform(x)
错误是因为在非数值编码之前模型已经拟合。
您正在以日期格式提供模型数据,遗憾的是这不起作用。
这种性质的机器学习模型只接受数字或二进制数据。
Sklearn documentation
数据预处理 是构建模型并将其部署到生产环境中最重要的部分之一。
如果您需要帮助清理数据,请留言。我很乐意提供帮助。
否则,请参考此处:Intro article on data cleaning