如何在多个列(Pandas/Python)上操作一个函数?
How to operate a function over multiple columns (Pandas/Python)?
让我们考虑来自 Kaggle (https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) 的 IBM 人力资源流失数据集。如何快速获取具有最高 Shapiro p 值的变量?
换句话说,我可以在列中应用函数 shapiro()
作为 shapiro(df['column'])
。我想为所有数字列计算这些函数。
我试过这个:
from scypy.stats import shapiro
df = pd.read_csv('path')
#here i was expecting the output to be a sequential prints with the name of the columns and their respective p-value from shapiro()
for col in hr:
print(col," : ", shapiro(hr[col])[0])
有人可以帮忙吗?
提前致谢。
希望对您有所帮助!我确信有很多更好的方法,但尝试很有趣:)
import pandas as pd
from scipy import stats
df = pd.read_csv('path.csv')
# make a new dataframe newdf with only the columns containing numeric data
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
newdf = df.select_dtypes (include=numerics)
#check to see that the columns are only numeric
print(newdf.head())
# new dataframe with rows "W" and "P"
shapiro_wilks = (newdf).apply(lambda x: pd.Series(shapiro(x), index=['W','P'])).reset_index()
shapiro_wilks = shapiro_wilks.set_index('index') #ugh
print(shapiro_wilks)
让我们考虑来自 Kaggle (https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) 的 IBM 人力资源流失数据集。如何快速获取具有最高 Shapiro p 值的变量?
换句话说,我可以在列中应用函数 shapiro()
作为 shapiro(df['column'])
。我想为所有数字列计算这些函数。
我试过这个:
from scypy.stats import shapiro
df = pd.read_csv('path')
#here i was expecting the output to be a sequential prints with the name of the columns and their respective p-value from shapiro()
for col in hr:
print(col," : ", shapiro(hr[col])[0])
有人可以帮忙吗?
提前致谢。
希望对您有所帮助!我确信有很多更好的方法,但尝试很有趣:)
import pandas as pd
from scipy import stats
df = pd.read_csv('path.csv')
# make a new dataframe newdf with only the columns containing numeric data
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
newdf = df.select_dtypes (include=numerics)
#check to see that the columns are only numeric
print(newdf.head())
# new dataframe with rows "W" and "P"
shapiro_wilks = (newdf).apply(lambda x: pd.Series(shapiro(x), index=['W','P'])).reset_index()
shapiro_wilks = shapiro_wilks.set_index('index') #ugh
print(shapiro_wilks)