从具有重复项的数据集创建一个数据透视表 table
create a pivot table from a dataset with duplicates
我正在尝试根据此 dataset(性别薪酬)
创建数据透视图 table 和热图
我的代码是:
df = df.pivot('Seniority', 'TotalPay', 'Gender')
ax = sns.heatmap(df)
但是我得到一个错误:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self,
key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
6 frames
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Gender'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self,
key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'Gender'
谁能帮我解决这个问题?我尝试使用 drop_duplicates() 删除重复项,但仍然无效。非常感谢。
我的猜测是,我创建代码时的理解是我想创建一个按年份和性别划分的每月和奖金总额的热图。我发布这个是因为发问者给了我一个回应的机会。要索引的 SENIORITY 的分组是主元中错误的原因,已被消除。旋转和转换热图的数据。
import pandas as pd
import seaborn as sns
df = pd.read_csv('/content/Glassdoor Gender Pay Gap.csv', sep=';')
df['TotalPay'] = df['BasePay'] + df['Bonus']
df.head()
JobTitle Gender Age PerfEval Education Dept Seniority BasePay Bonus TotalPay
0 Graphic Designer Female 18 5 College Operations 2 42363 9938 52301
1 Software Engineer Male 21 5 College Management 5 108476 11128 119604
2 Warehouse Associate Female 19 4 PhD Administration 5 90208 9268 99476
3 Software Engineer Male 20 5 Masters Sales 4 108080 10154 118234
4 Graphic Designer Male 26 5 Masters Engineering 5 99464 9319 108783
df = df.groupby(['Seniority', 'Gender'])['TotalPay'].sum().to_frame('TotalPay').reset_index()
df.head()
Seniority Gender TotalPay
0 1 Female 6267525
1 1 Male 9892202
2 2 Female 8738918
3 2 Male 10224439
4 3 Female 10359925
df = df.pivot('Seniority', 'Gender', 'TotalPay')
df
Gender Female Male
Seniority
1 6267525 9892202
2 8738918 10224439
3 10359925 11775112
4 8483675 11932083
5 11273034 11992901
sns.heatmap(df, annot=True, fmt=',')
我正在尝试根据此 dataset(性别薪酬)
创建数据透视图 table 和热图我的代码是:
df = df.pivot('Seniority', 'TotalPay', 'Gender')
ax = sns.heatmap(df)
但是我得到一个错误:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self,
key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
6 frames
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Gender'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self,
key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'Gender'
谁能帮我解决这个问题?我尝试使用 drop_duplicates() 删除重复项,但仍然无效。非常感谢。
我的猜测是,我创建代码时的理解是我想创建一个按年份和性别划分的每月和奖金总额的热图。我发布这个是因为发问者给了我一个回应的机会。要索引的 SENIORITY 的分组是主元中错误的原因,已被消除。旋转和转换热图的数据。
import pandas as pd
import seaborn as sns
df = pd.read_csv('/content/Glassdoor Gender Pay Gap.csv', sep=';')
df['TotalPay'] = df['BasePay'] + df['Bonus']
df.head()
JobTitle Gender Age PerfEval Education Dept Seniority BasePay Bonus TotalPay
0 Graphic Designer Female 18 5 College Operations 2 42363 9938 52301
1 Software Engineer Male 21 5 College Management 5 108476 11128 119604
2 Warehouse Associate Female 19 4 PhD Administration 5 90208 9268 99476
3 Software Engineer Male 20 5 Masters Sales 4 108080 10154 118234
4 Graphic Designer Male 26 5 Masters Engineering 5 99464 9319 108783
df = df.groupby(['Seniority', 'Gender'])['TotalPay'].sum().to_frame('TotalPay').reset_index()
df.head()
Seniority Gender TotalPay
0 1 Female 6267525
1 1 Male 9892202
2 2 Female 8738918
3 2 Male 10224439
4 3 Female 10359925
df = df.pivot('Seniority', 'Gender', 'TotalPay')
df
Gender Female Male
Seniority
1 6267525 9892202
2 8738918 10224439
3 10359925 11775112
4 8483675 11932083
5 11273034 11992901
sns.heatmap(df, annot=True, fmt=',')