IndexError: index 1967 is out of bounds for axis 0 with size 1967
IndexError: index 1967 is out of bounds for axis 0 with size 1967
通过计算 p 值,我减少了大型稀疏文件中的特征数量。但是我得到了这个错误。我看过类似的帖子,但这段代码适用于非稀疏输入。你能帮忙吗? (如果需要我可以上传输入文件)
import statsmodels.formula.api as sm
def backwardElimination(x, Y, sl, columns):
numVars = len(x[0])
pvalue_removal_counter = 0
for i in range(0, numVars):
print(i, 'of', numVars)
regressor_OLS = sm.OLS(Y, x).fit()
maxVar = max(regressor_OLS.pvalues).astype(float)
if maxVar > sl:
for j in range(0, numVars - i):
if (regressor_OLS.pvalues[j].astype(float) == maxVar):
x = np.delete(x, j, 1)
pvalue_removal_counter += 1
columns = np.delete(columns, j)
regressor_OLS.summary()
return x, columns
输出:
0 of 1970
1 of 1970
2 of 1970
Traceback (most recent call last):
File "main.py", line 142, in <module>
selected_columns)
File "main.py", line 101, in backwardElimination
if (regressor_OLS.pvalues[j].astype(float) == maxVar):
IndexError: index 1967 is out of bounds for axis 0 with size 1967
这里是固定版本。
我做了一些修改:
- 从 statsmodels.api
导入正确的 OLS
- 在函数
中生成columns
- 用
np.argmax
求最大值的位置
- 对 select 列使用布尔索引。在伪代码中,它就像
x[:, [True, False, True]]
保留第 0 列和第 2 列。
- 如果没有什么可放下的就停止。
import numpy as np
# Wrong import. Not using the formula interface, so using statsmodels.api
import statsmodels.api as sm
def backwardElimination(x, Y, sl):
numVars = x.shape[1] # variables in columns
columns = np.arange(numVars)
for i in range(0, numVars):
print(i, 'of', numVars)
regressor_OLS = sm.OLS(Y, x).fit()
if maxVar > sl:
# Use boolean selection
retain = np.ones(x.shape[1], bool)
drop = np.argmax(regressor_OLS.pvalues)
# Drop the highest pvalue(s)
retain[drop] = False
# Keep the x we with to retain
x = x[:, retain]
# Also keep their column indices
columns = columns[retain]
else:
# Exit early if everything has pval above sl
break
# Show the final summary
print(regressor_OLS.summary())
return x, columns
您可以使用
进行测试
x = np.random.standard_normal((1000,100))
y = np.random.standard_normal(1000)
backwardElimination(x,y,0.1)
通过计算 p 值,我减少了大型稀疏文件中的特征数量。但是我得到了这个错误。我看过类似的帖子,但这段代码适用于非稀疏输入。你能帮忙吗? (如果需要我可以上传输入文件)
import statsmodels.formula.api as sm
def backwardElimination(x, Y, sl, columns):
numVars = len(x[0])
pvalue_removal_counter = 0
for i in range(0, numVars):
print(i, 'of', numVars)
regressor_OLS = sm.OLS(Y, x).fit()
maxVar = max(regressor_OLS.pvalues).astype(float)
if maxVar > sl:
for j in range(0, numVars - i):
if (regressor_OLS.pvalues[j].astype(float) == maxVar):
x = np.delete(x, j, 1)
pvalue_removal_counter += 1
columns = np.delete(columns, j)
regressor_OLS.summary()
return x, columns
输出:
0 of 1970
1 of 1970
2 of 1970
Traceback (most recent call last):
File "main.py", line 142, in <module>
selected_columns)
File "main.py", line 101, in backwardElimination
if (regressor_OLS.pvalues[j].astype(float) == maxVar):
IndexError: index 1967 is out of bounds for axis 0 with size 1967
这里是固定版本。
我做了一些修改:
- 从 statsmodels.api 导入正确的
- 在函数 中生成
- 用
np.argmax
求最大值的位置 - 对 select 列使用布尔索引。在伪代码中,它就像
x[:, [True, False, True]]
保留第 0 列和第 2 列。 - 如果没有什么可放下的就停止。
OLS
columns
import numpy as np
# Wrong import. Not using the formula interface, so using statsmodels.api
import statsmodels.api as sm
def backwardElimination(x, Y, sl):
numVars = x.shape[1] # variables in columns
columns = np.arange(numVars)
for i in range(0, numVars):
print(i, 'of', numVars)
regressor_OLS = sm.OLS(Y, x).fit()
if maxVar > sl:
# Use boolean selection
retain = np.ones(x.shape[1], bool)
drop = np.argmax(regressor_OLS.pvalues)
# Drop the highest pvalue(s)
retain[drop] = False
# Keep the x we with to retain
x = x[:, retain]
# Also keep their column indices
columns = columns[retain]
else:
# Exit early if everything has pval above sl
break
# Show the final summary
print(regressor_OLS.summary())
return x, columns
您可以使用
进行测试x = np.random.standard_normal((1000,100))
y = np.random.standard_normal(1000)
backwardElimination(x,y,0.1)