Python SKlearn TfidfVectorizer 参数错误

Python SKlearn TfidfVectorizer arguments error

我一直在使用 SKlearn TfidfVectorizer 但突然出现错误:

TypeError: __init__() takes 1 positional argument but 2 positional arguments 
(and 4 keyword-only arguments) were given

我给出的论据是:

tfidf_vectorizer = TfidfVectorizer(X_train, ngram_range=(1,2), max_df=0.9, min_df=5, token_pattern=r'(\S+)' )

其中 X_train 是一个字符串列表,例如:

 'done earlier siesta',
 'sunday mass us family greatful opportunity',
 'wet wet wet frustrated outside',
 'tired headache headache',
 'friends creative talented inspired friendship love creatives',
 'grateful lucky beaches sunshine hubby family pets awesome sunday',
 'latest artwork',
 'two headache sick tired sore'

我很困惑为什么它会说我给出了两个位置参数,而我只输入了一个 X_train 列表。即使我将语句简化为:

TfidfVectorizer(X_train)

它仍然给出同样的错误,说我给出了两个位置参数。 我正在使用 Sklearn 1.0.1 但我尝试将其恢复为 1.0.0 但它仍然有相同的错误 错误可能在我传递的列表中吗?

库及其实现确实发生了变化。如果我们查看版本 0.23.1,我们会收到一条警告,指出它需要使用关键字参数传递。

tfidvect=TfidfVectorizer(X_train)
FutureWarning: Pass input=['done earlier siesta', 'sunday mass us family greatful opportunity', 'wet wet wet frustrated outside', 'tired headache headache', 'friends creative talented inspired friendship love creatives', 'grateful lucky beaches sunshine hubby family pets awesome sunday', 'latest artwork', 'two headache sick tired sore'] as keyword args. From version 0.25 passing these as positional arguments will result in an error
  warnings.warn("Pass {} as keyword args. From version 0.25 "

快进到 1.0.1,同样的调用将是这样的:

tfidvect1_01=TfidfVectorizer(input=X_train) # input positional argument

@Ambrayers 添加。

另一种方法是,创建对象然后 fit_transform ,参考 official documentation

中的示例
vectorizer = TfidfVectorizer()  
X_train = vectorizer.fit_transform(X_train)