Python 相关指数
Python Correlation index
给定一个数据框“df”,我需要获取地区=“加利福尼亚”的平均价格和总成交量之间的相关指数。
给定数据框:
加州均价与总量的相关指数:
cali_mean = df.groupby('Region').get_group('California')['AveragePrice'].mean()
max_volume = (df.groupby('Region')['TotalVolume'].sum()).max() #Output: 1028981653.17
# Correlation index between California mean price and total volume
df[cali_mean].corr(df['max_volume'])
当我尝试确定加利福尼亚平均价格与总成交量之间的相关指数时,我收到以下错误消息。有办法解决这个问题吗?
错误信息
KeyError Traceback (most recent call last)
~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
~/opt/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/opt/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1.3939644970414187
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/var/folders/wv/42dn23fd1cb0czpvqdnb6zw00000gn/T/ipykernel_18660/3247367876.py in <module>
1 # Correlation index between California mean price and total volume
----> 2 df[cali_mean].corr(df['max_volume'])
~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
KeyError: 1.3939644970414187
请注意,相关性是两个向量的度量。所以你可以使用:
df = pd.read_csv('avocado.csv')
temp = df[df['Region']=='California']
temp['AveragePrice'].corr(temp['TotalVolume'])
输出:
-0.7913852550045145
给定一个数据框“df”,我需要获取地区=“加利福尼亚”的平均价格和总成交量之间的相关指数。
给定数据框:
加州均价与总量的相关指数:
cali_mean = df.groupby('Region').get_group('California')['AveragePrice'].mean()
max_volume = (df.groupby('Region')['TotalVolume'].sum()).max() #Output: 1028981653.17
# Correlation index between California mean price and total volume
df[cali_mean].corr(df['max_volume'])
当我尝试确定加利福尼亚平均价格与总成交量之间的相关指数时,我收到以下错误消息。有办法解决这个问题吗?
错误信息
KeyError Traceback (most recent call last)
~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
~/opt/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/opt/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1.3939644970414187
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/var/folders/wv/42dn23fd1cb0czpvqdnb6zw00000gn/T/ipykernel_18660/3247367876.py in <module>
1 # Correlation index between California mean price and total volume
----> 2 df[cali_mean].corr(df['max_volume'])
~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
KeyError: 1.3939644970414187
请注意,相关性是两个向量的度量。所以你可以使用:
df = pd.read_csv('avocado.csv')
temp = df[df['Region']=='California']
temp['AveragePrice'].corr(temp['TotalVolume'])
输出:
-0.7913852550045145