根据列将dataframe分成两组
Divide dataframe into two sets according to a column
我有 Dataframe df
我选择了其中的一些库,我想根据名为 Sevrice 的库将它们分为 xtrain 和 xtest。因此,带有 1 和 o 的 raws 进入 xtrain,nan 进入 xtest。
Service
1
0
0
1
Nan
Nan
xtarin = df.loc[df['Service'].notnull(), ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
已编辑
ytrain = df['Service'].dropna()
Xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
import pandas as pd
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(xtrain, ytrain)
logistic.predict(xtest)
我收到 logistic.predict(xtest)
的错误
X has 220 features per sample; expecting 307
我认为你需要 isnull
:
Xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
另一个解决方案是通过 ~
反转 boolean mask
:
mask = df['Service'].notnull()
xtarin = df.loc[mask, ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
Xtest = df.loc[~mask, ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
编辑:
df = pd.DataFrame({'Service':[1,0,np.nan,np.nan],
'Age':[4,5,6,5],
'Fare':[7,8,9,5],
'GSize':[1,3,5,7],
'Deck':[5,3,6,2],
'Class':[7,4,3,0],
'Profession_title':[6,7,4,6]})
print (df)
Age Class Deck Fare GSize Profession_title Service
0 4 7 5 7 1 6 1.0
1 5 4 3 8 3 7 0.0
2 6 3 6 9 5 4 NaN
3 5 0 2 5 7 6 NaN
ytrain = df['Service'].dropna()
xtrain = df.loc[df['Service'].notnull(), ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
import pandas as pd
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(xtrain, ytrain)
print (logistic.predict(xtest))
[ 0. 0.]
我有 Dataframe df
我选择了其中的一些库,我想根据名为 Sevrice 的库将它们分为 xtrain 和 xtest。因此,带有 1 和 o 的 raws 进入 xtrain,nan 进入 xtest。
Service
1
0
0
1
Nan
Nan
xtarin = df.loc[df['Service'].notnull(), ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
已编辑
ytrain = df['Service'].dropna()
Xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
import pandas as pd
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(xtrain, ytrain)
logistic.predict(xtest)
我收到 logistic.predict(xtest)
X has 220 features per sample; expecting 307
我认为你需要 isnull
:
Xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
另一个解决方案是通过 ~
反转 boolean mask
:
mask = df['Service'].notnull()
xtarin = df.loc[mask, ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
Xtest = df.loc[~mask, ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
编辑:
df = pd.DataFrame({'Service':[1,0,np.nan,np.nan],
'Age':[4,5,6,5],
'Fare':[7,8,9,5],
'GSize':[1,3,5,7],
'Deck':[5,3,6,2],
'Class':[7,4,3,0],
'Profession_title':[6,7,4,6]})
print (df)
Age Class Deck Fare GSize Profession_title Service
0 4 7 5 7 1 6 1.0
1 5 4 3 8 3 7 0.0
2 6 3 6 9 5 4 NaN
3 5 0 2 5 7 6 NaN
ytrain = df['Service'].dropna()
xtrain = df.loc[df['Service'].notnull(), ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
import pandas as pd
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(xtrain, ytrain)
print (logistic.predict(xtest))
[ 0. 0.]