Python 相关矩阵 - 只需要绝对值大于 .5 的列

Python Correlation Matrix - Only Want columns that have absolute value more than .5

我有 41 个变量,其中大部分根本不相关。我只想包括几个列来说明相关性更高或负相关性更高的列。尽我所能,即使我看过很多文章和问题,我似乎也无法让它发挥作用。谢谢。

df.columns

索引(['ResponseId', 'Consent', 'AgeQualifier', 'Team', 'TeamOther', 'FanStrength', 'WinImportance', 'Emotion', 'Happiness', 'Satisfaction', 'Passion', 'ViewershipHomeGame', 'ViewershipRoadGame', 'ViewershipTVCable', 'ViewershipStreaming', 'ViewershipRestaurantBar', 'NameChangeViewershipHomeGame', 'NameChangeViewershipRoadGame', 'NameChangeViewershipTVCable', 'NameChangeViewershipStreaming', 'NameChangeViewershipRestaurantBar', 'Purchased', 'Purchased_Jersey_1', 'Purchased_Clothing_2', 'Purchased_Memorabilia_3', 'Purchased_Office_4', 'Purchased_Equipment_5', 'PurchaseIntentionNameChangeJersey', 'PurchaseIntentionNameChangeClothing', 'PurchaseIntentionNameChangeMemorabilia', 'PurchaseIntentionNameChangeHomeOffice', 'PurchaseIntentionNameChangeEquipment', 'Support_SeasonTickets', 'Support_Donations ', 'Support_Volunteer ', 'SupportNameChangeSeasonTickets', 'SupportNameChangeDonateMoney', 'SupportNameChangeVolunteer', 'State', 'Gender', 'Age', 'Ethnicity', 'EthnicityOther', 'Income', 'Drawing', 'Email'], dtype='object')

correlation_matrix = df.corr().round(2)

无花果, 斧头 = plt.subplots(figsize=(50,50)) sns.heatmap(data=correlation_matrix,cmap = 'rainbow', annot=True, ax=ax)

想法?

清理后的矩阵为

Consent AgeQualifier Team FanStrength WinImportance
Consent NaN NaN NaN NaN NaN
AgeQualifier NaN 1.0 NaN NaN NaN
Team NaN NaN 1.00 0.02 0.02
FanStrength NaN NaN 0.02 1.00 0.69
WinImportance NaN NaN 0.02 0.69 1.00

要解决此问题,您需要 select 任何矩阵值非对角线且绝对值 >0.5

temp = df[(df>0.5)&(df!=1)].abs().max()
print(temp[~temp.isna()])

这将产生在相关矩阵中至少有一个相关性 >0.5

的列名

这会产生

FanStrength      0.69
WinImportance    0.69
dtype: float64