具有空 pandas dataFrame 列的稳健回归

Robust regression with empty pandas dataFrame columns

我有一个 pandas DataFrame。例如,我有以下内容: column1 column2 column3 34 nan 3 45 nan 1 45 nan 3 45 nan 3 46 nan 3 45 nan nan 45 nan 3 47 nan 5 45 nan 3 50 nan 3

我想使用 Theil Sen 做一些回归。我写了以下脚本:

def LR(df)
    line = {}
    slope = {}
    for k, v in df.iteritems():
        if v.empty:
            pass  # This is to check if a column is empty
        else:
            xm = np.ma.masked_array(df.index.values, mask=np.isnan(df[k]).compressed()
            ym = np.ma.masked_array(df[k], mask=np.isnan(df[k]).compressed()
            res = stats.theislopes(ym, xm, 0.90)
            line[k] = res[1] + res[0] * xm
            slope[k] = res[0]
    return line, slope

问题是我有这个错误:

ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,).

当我使用调试模式时,似乎特定列为空时出现错误。

实际问题是什么?

我设法使用 sklearn 修复了它,如下所示:

for k,v in df.iteritems():

  xm=np.ma.masked_array(df.index.values,mask=np.isnan(df[k])).compressed()
  ym=np.ma.masked_array(df[k], mask=np.isnan(df[k])).compressed()

  if len(xm)>0 and len(ym)>0:
    model=TheilSenRegressor()
    xm=np.reshape(len(xm),1)
    ym=np.reshape(len(ym),1)

    model.fit(xm,ym)
    return model.intercept_, model.coef_
  else: pass