Python 相关矩阵 - 只需要绝对值大于 .5 的列
Python Correlation Matrix - Only Want columns that have absolute value more than .5
我有 41 个变量,其中大部分根本不相关。我只想包括几个列来说明相关性更高或负相关性更高的列。尽我所能,即使我看过很多文章和问题,我似乎也无法让它发挥作用。谢谢。
df.columns
索引(['ResponseId', 'Consent', 'AgeQualifier', 'Team', 'TeamOther',
'FanStrength', 'WinImportance', 'Emotion', 'Happiness', 'Satisfaction',
'Passion', 'ViewershipHomeGame', 'ViewershipRoadGame',
'ViewershipTVCable', 'ViewershipStreaming', 'ViewershipRestaurantBar',
'NameChangeViewershipHomeGame', 'NameChangeViewershipRoadGame',
'NameChangeViewershipTVCable', 'NameChangeViewershipStreaming',
'NameChangeViewershipRestaurantBar', 'Purchased', 'Purchased_Jersey_1',
'Purchased_Clothing_2', 'Purchased_Memorabilia_3', 'Purchased_Office_4',
'Purchased_Equipment_5', 'PurchaseIntentionNameChangeJersey',
'PurchaseIntentionNameChangeClothing',
'PurchaseIntentionNameChangeMemorabilia',
'PurchaseIntentionNameChangeHomeOffice',
'PurchaseIntentionNameChangeEquipment', 'Support_SeasonTickets',
'Support_Donations ', 'Support_Volunteer ',
'SupportNameChangeSeasonTickets', 'SupportNameChangeDonateMoney',
'SupportNameChangeVolunteer', 'State', 'Gender', 'Age', 'Ethnicity',
'EthnicityOther', 'Income', 'Drawing', 'Email'],
dtype='object')
correlation_matrix = df.corr().round(2)
无花果, 斧头 = plt.subplots(figsize=(50,50))
sns.heatmap(data=correlation_matrix,cmap = 'rainbow', annot=True, ax=ax)
想法?
清理后的矩阵为
Consent
AgeQualifier
Team
FanStrength
WinImportance
Consent
NaN
NaN
NaN
NaN
NaN
AgeQualifier
NaN
1.0
NaN
NaN
NaN
Team
NaN
NaN
1.00
0.02
0.02
FanStrength
NaN
NaN
0.02
1.00
0.69
WinImportance
NaN
NaN
0.02
0.69
1.00
要解决此问题,您需要 select 任何矩阵值非对角线且绝对值 >0.5
temp = df[(df>0.5)&(df!=1)].abs().max()
print(temp[~temp.isna()])
这将产生在相关矩阵中至少有一个相关性 >0.5
的列名
这会产生
FanStrength 0.69
WinImportance 0.69
dtype: float64
我有 41 个变量,其中大部分根本不相关。我只想包括几个列来说明相关性更高或负相关性更高的列。尽我所能,即使我看过很多文章和问题,我似乎也无法让它发挥作用。谢谢。
df.columns
索引(['ResponseId', 'Consent', 'AgeQualifier', 'Team', 'TeamOther', 'FanStrength', 'WinImportance', 'Emotion', 'Happiness', 'Satisfaction', 'Passion', 'ViewershipHomeGame', 'ViewershipRoadGame', 'ViewershipTVCable', 'ViewershipStreaming', 'ViewershipRestaurantBar', 'NameChangeViewershipHomeGame', 'NameChangeViewershipRoadGame', 'NameChangeViewershipTVCable', 'NameChangeViewershipStreaming', 'NameChangeViewershipRestaurantBar', 'Purchased', 'Purchased_Jersey_1', 'Purchased_Clothing_2', 'Purchased_Memorabilia_3', 'Purchased_Office_4', 'Purchased_Equipment_5', 'PurchaseIntentionNameChangeJersey', 'PurchaseIntentionNameChangeClothing', 'PurchaseIntentionNameChangeMemorabilia', 'PurchaseIntentionNameChangeHomeOffice', 'PurchaseIntentionNameChangeEquipment', 'Support_SeasonTickets', 'Support_Donations ', 'Support_Volunteer ', 'SupportNameChangeSeasonTickets', 'SupportNameChangeDonateMoney', 'SupportNameChangeVolunteer', 'State', 'Gender', 'Age', 'Ethnicity', 'EthnicityOther', 'Income', 'Drawing', 'Email'], dtype='object')
correlation_matrix = df.corr().round(2)
无花果, 斧头 = plt.subplots(figsize=(50,50)) sns.heatmap(data=correlation_matrix,cmap = 'rainbow', annot=True, ax=ax)
想法?
清理后的矩阵为
Consent | AgeQualifier | Team | FanStrength | WinImportance | |
---|---|---|---|---|---|
Consent | NaN | NaN | NaN | NaN | NaN |
AgeQualifier | NaN | 1.0 | NaN | NaN | NaN |
Team | NaN | NaN | 1.00 | 0.02 | 0.02 |
FanStrength | NaN | NaN | 0.02 | 1.00 | 0.69 |
WinImportance | NaN | NaN | 0.02 | 0.69 | 1.00 |
要解决此问题,您需要 select 任何矩阵值非对角线且绝对值 >0.5
temp = df[(df>0.5)&(df!=1)].abs().max()
print(temp[~temp.isna()])
这将产生在相关矩阵中至少有一个相关性 >0.5
的列名这会产生
FanStrength 0.69
WinImportance 0.69
dtype: float64