'Series' 对象没有属性 'lower' tfidf
'Series' object has no attribute 'lower' tfidf
我尝试使用 tfidf 来准备我的数据,但我遇到了同样的错误。
X = df['Description'], df['Type']
y =df['Description'], df['Type']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.33, random_state=42)
df['Description']=[" ".join(Description) for Description in df['Description'].values]
tfidf = TfidfVectorizer(stop_words='english')
t_x_train = tfidf.fit_transform(X_train)
t_x_test = tfidf.transform(y_test)
当我 运行 它发生了
AttributeError: 'Series' object has no attribute 'lower'
Sklearn 尝试将 str.lower()
应用于 y_test
中的元素。但是,数据类型似乎不兼容。
请检查:
- 数据类型使用
y_test.dtypes
或如下所示转换为字符串
- 传递给
tfidf
时y_test
是否应该替换为X_test
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
corpus = [
('This is the first document.',4),
('This document is the second document.',3),
('And this is the third one.',2),
('Is this the first document?',1)
]
df= pd.DataFrame(corpus, columns = ['Description', 'Type'])
X = df['Description']
# make sure your target is also a series of strings if not already
y = df['Type'].astype('str')
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.33, random_state=42)
# df['Description']=[" ".join(Description) for Description in df['Description'].values]
tfidf = TfidfVectorizer(stop_words='english')
t_x_train = tfidf.fit_transform(X_train)
t_x_test = tfidf.transform(y_test)
我尝试使用 tfidf 来准备我的数据,但我遇到了同样的错误。
X = df['Description'], df['Type']
y =df['Description'], df['Type']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.33, random_state=42)
df['Description']=[" ".join(Description) for Description in df['Description'].values]
tfidf = TfidfVectorizer(stop_words='english')
t_x_train = tfidf.fit_transform(X_train)
t_x_test = tfidf.transform(y_test)
当我 运行 它发生了
AttributeError: 'Series' object has no attribute 'lower'
Sklearn 尝试将 str.lower()
应用于 y_test
中的元素。但是,数据类型似乎不兼容。
请检查:
- 数据类型使用
y_test.dtypes
或如下所示转换为字符串 - 传递给
tfidf
时
y_test
是否应该替换为X_test
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
corpus = [
('This is the first document.',4),
('This document is the second document.',3),
('And this is the third one.',2),
('Is this the first document?',1)
]
df= pd.DataFrame(corpus, columns = ['Description', 'Type'])
X = df['Description']
# make sure your target is also a series of strings if not already
y = df['Type'].astype('str')
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.33, random_state=42)
# df['Description']=[" ".join(Description) for Description in df['Description'].values]
tfidf = TfidfVectorizer(stop_words='english')
t_x_train = tfidf.fit_transform(X_train)
t_x_test = tfidf.transform(y_test)