尝试使用 pop 从 pd.Series 列表中删除项目但无法正常工作
Trying to remove item from a list of pd.Series using pop but not working properly
我正在尝试 class 使用统计方法(如分位数范围和 zscore)查找数据集中的异常值。我想知道为什么有些离群值从我的 pd.Series 列表中删除,而有些则没有给出空条件。
class OutliersDetector:
def __init__(self, X):
self.outliers = []
self.X = X
def detect_range(self):
self.reset_outliers()
# to implement
self.remove_empty_items()
def detect_zscore(self):
self.reset_outliers()
zscore = np.abs(stats.zscore(self.X))
threshold_std = 3
for index, col_name in enumerate(self.X.columns): # X and zscore always have the same shape
col = zscore[:, index]
self.outliers.append( pd.Series(col[col >= threshold_std], name=col_name) )
self.remove_empty_items()
# none of the if statements i tried worked
def remove_empty_items(self):
for index, item in enumerate(self.outliers):
#if item.size == 0:
#if len(item.index) == 0:
if item.empty:
print("[no outliers] {}".format(item.name))
self.outliers.pop(index)
def reset_outliers(self):
self.outliers = []
def show_outliers(self):
for item in self.outliers:
print("[name]: {}\n[outliers]: {}\n".format(item.name, item.size))
outliers_detector = OutliersDetector(X_train_transformed)
outliers_detector.detect_zscore()
print("\noutliers found: ")
outliers_detector.show_outliers()
输出:Rainfall、Month、Location、WindDir9a 不应打印在“找到的异常值”下方,因为大小为 0,但是...
[no outliers] RainToday
[no outliers] Year
[no outliers] Day
[no outliers] WindGustDir
[no outliers] WindDir3pm
[no outliers] Sunshine
[no outliers] Humidity3pm
[no outliers] Cloud9am
outliers found:
[name]: Rainfall
[outliers]: 0
[name]: Evaporation
[outliers]: 289
[name]: Month
[outliers]: 0
[name]: Location
[outliers]: 0
[name]: WindDir9am
[outliers]: 0
我该如何解决这个问题?
在 remove_empty_items
中,您正在修改 self.outliers
列表,同时遍历它。这会导致未定义的行为。您的代码应该创建一个新列表而不是修改当前列表:
def remove_empty_items(self):
non_empty_outliers = []
for item in self.outliers:
if item.empty:
print("[no outliers] {}".format(item.name))
else:
non_empty_outliers.append(item)
self.outliers = non_empty_outliers
我正在尝试 class 使用统计方法(如分位数范围和 zscore)查找数据集中的异常值。我想知道为什么有些离群值从我的 pd.Series 列表中删除,而有些则没有给出空条件。
class OutliersDetector:
def __init__(self, X):
self.outliers = []
self.X = X
def detect_range(self):
self.reset_outliers()
# to implement
self.remove_empty_items()
def detect_zscore(self):
self.reset_outliers()
zscore = np.abs(stats.zscore(self.X))
threshold_std = 3
for index, col_name in enumerate(self.X.columns): # X and zscore always have the same shape
col = zscore[:, index]
self.outliers.append( pd.Series(col[col >= threshold_std], name=col_name) )
self.remove_empty_items()
# none of the if statements i tried worked
def remove_empty_items(self):
for index, item in enumerate(self.outliers):
#if item.size == 0:
#if len(item.index) == 0:
if item.empty:
print("[no outliers] {}".format(item.name))
self.outliers.pop(index)
def reset_outliers(self):
self.outliers = []
def show_outliers(self):
for item in self.outliers:
print("[name]: {}\n[outliers]: {}\n".format(item.name, item.size))
outliers_detector = OutliersDetector(X_train_transformed)
outliers_detector.detect_zscore()
print("\noutliers found: ")
outliers_detector.show_outliers()
输出:Rainfall、Month、Location、WindDir9a 不应打印在“找到的异常值”下方,因为大小为 0,但是...
[no outliers] RainToday
[no outliers] Year
[no outliers] Day
[no outliers] WindGustDir
[no outliers] WindDir3pm
[no outliers] Sunshine
[no outliers] Humidity3pm
[no outliers] Cloud9am
outliers found:
[name]: Rainfall
[outliers]: 0
[name]: Evaporation
[outliers]: 289
[name]: Month
[outliers]: 0
[name]: Location
[outliers]: 0
[name]: WindDir9am
[outliers]: 0
我该如何解决这个问题?
在 remove_empty_items
中,您正在修改 self.outliers
列表,同时遍历它。这会导致未定义的行为。您的代码应该创建一个新列表而不是修改当前列表:
def remove_empty_items(self):
non_empty_outliers = []
for item in self.outliers:
if item.empty:
print("[no outliers] {}".format(item.name))
else:
non_empty_outliers.append(item)
self.outliers = non_empty_outliers