从 python 中的数据集中删除行

Question

我正在尝试获取一些被归类为离群值的行，并从原始数据集中删除这些行，但我无法让它工作 - 你们知道哪里出了问题吗？我尝试运行以下代码，并收到此错误“ValueError：索引数据必须是一维的”

#identify outliers
pred = iforest.fit_predict(x)
outlier_index = np.where(pred==-1)
outlier_values = x.iloc[outlier_index]
#remove from dataset (dataset = x)
x_new = x.drop([outlier_values])

outlier_values original dataset

Answer 1

您链接的 outlier_values 是一个数据框，而不是索引的平面列表，因此会相应地抛出值错误。

您需要做的是从 outlier_values 数据框中提取索引列表，使用：

index_list = outlier_values.index.values.tolist()

进入索引列表，然后从 x 中删除这些索引。

如 this 答案

Answer 2

试试这个

#identify outliers
pred = iforest.fit_predict(x)

# np.where returns a tuple of ndarray we access the first dimension
outlier_index = np.where(pred==-1)[0] 

outlier_values = x.iloc[outlier_index]

#remove from dataset (dataset = x)
x_new = x.drop([outlier_values])

在您的情况下，您可以直接传递 outlier_index

#identify outliers
pred = iforest.fit_predict(x)
outlier_index = np.where(pred==-1)[0]
df = df.drop(outlier_index)

从 python 中的数据集中删除行

Remove rows from dataset in python

python

isolation-forest