TypeError: unsupported operand type(s) for -: ‘str’ and ‘int’ in PyCaret regression

TypeError: unsupported operand type(s) for -: ‘str’ and ‘int’ in PyCaret regression

我阅读了有关此主题的多个可用问题,但仍然不明白我的问题。

我正在尝试建立回归,使用 PyCaret:

from pycaret.regression import *
fooPy = setup(data = foo, target = 'pts', session_id = 123)

我收到错误:

TypeError: unsupported operand type(s) for +: 'int' and 'str'

不确定问题出在哪里,因为我在我的结构中没有看到任何字符串:

pts_500                   float64
pts_500_p                 float64
OBP_avg                   float64
SLG_avg                   float64
SB_avg                    float64
RBI_avg                   float64
R_avg                     float64
home                      int64
first_time_pitcher        int32
park_ratio_OBP            float64
park_ratio_SLG            float64
order                     float64
SO_avg_p                  float64
pts_500_parkadj_p         float64
pts_500_parkadj           float64
SLG_avg_parkadj           float64
OPS_avg_parkadj           float64
SLG_avg_parkadj_p         float64
OPS_avg_parkadj_p         float64
pts_BxP                   float64
SLG_BxP                   float64
OPS_BxP                   float64
whip_SO_BxP               float64
whip_SO_B                 float64
whip_SO_B_parkadj         float64
order                     float64
ops x pts_500 order15     float64
ops x pts_500 parkadj     float64
ops23 x pts_500           float64
ops x pts_500 orderadj    float64
whip_p                    float64
whip_SO_p                 float64
whip_SO_parkadj_p         float64
whip_parkadj_p            float64
pts                       float64
dtype: object

homefirst_time_pitcher 是整数。

完整错误如下所示:

感谢任何提示!

我自己找到了答案,非常琐碎和尴尬。

Order 变量在数据集中包含了两次。我检查了相关性,发现相同变量之间的相关性为 1.0。

# Check correlation
cor = df[features].corr()
cor.loc[:,:] = np.tril(cor, k=-1) 
cor = cor.stack()
cor[(cor > 0.7) | (cor < -0.7)]

只是添加到@Anakin Sykwalker 的回答中。此错误(带有令人困惑的错误消息)是由重复的列名引起的。

可以通过重命名(例如 df.rename)或删除(例如 df.drop)列来删除重复的列名之一来简单地解决此问题。

重现错误的示例包含在下面(使用 pycaret 2.3.6):

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# artificially create 2 columns with same name, Number of times pregnant
diabetes.columns = ['Number of times pregnant',
       'Number of times pregnant',
       'Diastolic blood pressure (mm Hg)', 'Triceps skin fold thickness (mm)',
       '2-Hour serum insulin (mu U/ml)',
       'Body mass index (weight in kg/(height in m)^2)',
       'Diabetes pedigree function', 'Age (years)', 'Class variable']

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

这将以下面的错误消息结束:

TypeError: unsupported operand type(s) for +: 'int' and 'str'