如何将 nan 或字符串值更改为其所属列的平均值?
How can ı change nan or string values to average of the column to which it belongs?
df = pd.read_csv(self.table_name)
for j in df.values:
for k in j[0:-1]:
try:
k = float(k)
except ValueError:
df.replace(to_replace=k,value=np.nan,inplace=True)
df.replace(to_replace=np.nan, value=df.mean(), inplace=True)
# df.fillna(df.mean(), inplace=True)
df.to_csv(self.table_name, index=False)
print(df)
If the data has string values while entering the training, it may
not enter the training. In order to prevent this, I created a function
that becomes active with a button, but the string values are
deleted in the first run, and in the second run, I get the result I
want. I made the button and its function over pyqt5. When the user
clicks the button I mentioned, he connects to this function and its
functions respectively. But where is the problem I could not solve, is
there anyone who can help?
您可以对所有列使用自定义函数,而无需先由 DataFrame.iloc
with convert values to numeric with to_numeric
and errors='coerce'
, so if created missing values for not parseable values. Last replace them by mean
in Series.fillna
选择:
def f(x):
s = pd.to_numeric(x, errors='coerce')
return s.fillna(s.mean())
df.iloc[:, 1:]= df.iloc[:, 1:].apply(f)
df = pd.read_csv(self.table_name)
for j in df.values:
for k in j[0:-1]:
try:
k = float(k)
except ValueError:
df.replace(to_replace=k,value=np.nan,inplace=True)
df.replace(to_replace=np.nan, value=df.mean(), inplace=True)
# df.fillna(df.mean(), inplace=True)
df.to_csv(self.table_name, index=False)
print(df)
If the data has string values while entering the training, it may not enter the training. In order to prevent this, I created a function that becomes active with a button, but the string values are deleted in the first run, and in the second run, I get the result I want. I made the button and its function over pyqt5. When the user clicks the button I mentioned, he connects to this function and its functions respectively. But where is the problem I could not solve, is there anyone who can help?
您可以对所有列使用自定义函数,而无需先由 DataFrame.iloc
with convert values to numeric with to_numeric
and errors='coerce'
, so if created missing values for not parseable values. Last replace them by mean
in Series.fillna
选择:
def f(x):
s = pd.to_numeric(x, errors='coerce')
return s.fillna(s.mean())
df.iloc[:, 1:]= df.iloc[:, 1:].apply(f)