重采样时出现类型错误
TypeError during resampling
我正在尝试对不平衡的数据集应用重采样 类。
我所做的如下:
from sklearn.utils import resample
y = df.Label
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['Text'].replace(np.NaN, ""))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, stratify=y)
# concatenate our training data back together
X = pd.concat([X_train, y_train], axis=1)
# separate minority and majority classes
not_df = X[X.Label==0]
df = X[X.Label==1]
# upsample minority
df_upsampled = resample(df,
replace=True,
n_samples=len(not_df),
random_state=27)
# combine majority and upsampled minority
upsampled = pd.concat([not_df, df_upsampled])
不幸的是,我在这一步遇到了一些问题:X = pd.concat([X_train, y_train], axis=1)
,即
/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
279 verify_integrity=verify_integrity,
280 copy=copy,
--> 281 sort=sort,
282 )
283
/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
355 "only Series and DataFrame objs are valid".format(typ=type(obj))
356 )
--> 357 raise TypeError(msg)
358
359 # consolidate
TypeError: cannot concatenate object of type '<class 'scipy.sparse.csr.csr_matrix'>'; only Series and DataFrame objs are valid
您可以将文本列视为
Text
Have a non-programming question?
More helpful links
I am trying to apply...
希望你能帮我处理一下。
您必须先将 X_train
转换为 Dataframe 才能使用 concat
X = pd.concat([pd.DataFrame(X_train), y_train], axis=1)
我正在尝试对不平衡的数据集应用重采样 类。 我所做的如下:
from sklearn.utils import resample
y = df.Label
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['Text'].replace(np.NaN, ""))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, stratify=y)
# concatenate our training data back together
X = pd.concat([X_train, y_train], axis=1)
# separate minority and majority classes
not_df = X[X.Label==0]
df = X[X.Label==1]
# upsample minority
df_upsampled = resample(df,
replace=True,
n_samples=len(not_df),
random_state=27)
# combine majority and upsampled minority
upsampled = pd.concat([not_df, df_upsampled])
不幸的是,我在这一步遇到了一些问题:X = pd.concat([X_train, y_train], axis=1)
,即
/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
279 verify_integrity=verify_integrity,
280 copy=copy,
--> 281 sort=sort,
282 )
283
/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
355 "only Series and DataFrame objs are valid".format(typ=type(obj))
356 )
--> 357 raise TypeError(msg)
358
359 # consolidate
TypeError: cannot concatenate object of type '<class 'scipy.sparse.csr.csr_matrix'>'; only Series and DataFrame objs are valid
您可以将文本列视为
Text
Have a non-programming question?
More helpful links
I am trying to apply...
希望你能帮我处理一下。
您必须先将 X_train
转换为 Dataframe 才能使用 concat
X = pd.concat([pd.DataFrame(X_train), y_train], axis=1)