根据列中的值计算 pd.DataFrame() 索引的中值
calculate a median value of pd.DataFrame() index based on values in the column
让我们假设我有一个 pd.DataFrame() 对象,它存储给定年龄和性别的过去中风的人数。更直观的方式:
positive_by_gender.tail()
给我们:
gender
Female
Male
age
78
9.0
12.0
79
13.0
4.0
80
10.0
7.0
81
8.0
6.0
82
4.0
5.0
所以有 9 名 78 岁的女性中风,12 名 78 岁的男性中风等
我想要的是计算他们中风的每个年龄性别的中位数 - 在这个例子中,女性是 79.5,但我希望它不是由我计算的代码:-) - 我我想我可以为女性制作一个数组,看起来像:[78 乘以 9、79 乘以 13、80 乘以 10,等等...] 然后以这种方式找到中位数,但仍然 - 我什至不知道该怎么做。非常感谢所有帮助。
按照您创建数组并以这种方式获取中位数的想法:
In [235]: df
Out[235]:
Female Male
age
78 9.0 12.0
79 13.0 4.0
80 10.0 7.0
81 8.0 6.0
82 4.0 5.0
In [236]: df = df.astype(int)
In [237]: df
Out[237]:
Female Male
age
78 9 12
79 13 4
80 10 7
81 8 6
82 4 5
In [238]: df = df.reset_index('age')
In [240]: df = df.melt(id_vars='age', var_name='gender', value_name='count')
In [241]: df
Out[241]:
age gender count
0 78 Female 9
1 79 Female 13
2 80 Female 10
3 81 Female 8
4 82 Female 4
5 78 Male 12
6 79 Male 4
7 80 Male 7
8 81 Male 6
9 82 Male 5
In [242]: df['age'] = df.apply(lambda s: [s['age']] * s['count'], axis=1)
In [243]: df
Out[243]:
age gender count
0 [78, 78, 78, 78, 78, 78, 78, 78, 78] Female 9
1 [79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 7... Female 13
2 [80, 80, 80, 80, 80, 80, 80, 80, 80, 80] Female 10
3 [81, 81, 81, 81, 81, 81, 81, 81] Female 8
4 [82, 82, 82, 82] Female 4
5 [78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78] Male 12
6 [79, 79, 79, 79] Male 4
7 [80, 80, 80, 80, 80, 80, 80] Male 7
8 [81, 81, 81, 81, 81, 81] Male 6
9 [82, 82, 82, 82, 82] Male 5
In [245]: df = df.explode('age')
In [249]: df['age'] = df['age'].astype(int)
In [251]: df
Out[251]:
age gender count
0 78 Female 9
0 78 Female 9
0 78 Female 9
0 78 Female 9
0 78 Female 9
.. ... ... ...
9 82 Male 5
9 82 Male 5
9 82 Male 5
9 82 Male 5
9 82 Male 5
[78 rows x 3 columns]
In [250]: df.groupby('gender')['age'].median()
Out[250]:
gender
Female 79.5
Male 80.0
Name: age, dtype: float64
让我们假设我有一个 pd.DataFrame() 对象,它存储给定年龄和性别的过去中风的人数。更直观的方式:
positive_by_gender.tail()
给我们:
gender | Female | Male |
---|---|---|
age | ||
78 | 9.0 | 12.0 |
79 | 13.0 | 4.0 |
80 | 10.0 | 7.0 |
81 | 8.0 | 6.0 |
82 | 4.0 | 5.0 |
所以有 9 名 78 岁的女性中风,12 名 78 岁的男性中风等
我想要的是计算他们中风的每个年龄性别的中位数 - 在这个例子中,女性是 79.5,但我希望它不是由我计算的代码:-) - 我我想我可以为女性制作一个数组,看起来像:[78 乘以 9、79 乘以 13、80 乘以 10,等等...] 然后以这种方式找到中位数,但仍然 - 我什至不知道该怎么做。非常感谢所有帮助。
按照您创建数组并以这种方式获取中位数的想法:
In [235]: df
Out[235]:
Female Male
age
78 9.0 12.0
79 13.0 4.0
80 10.0 7.0
81 8.0 6.0
82 4.0 5.0
In [236]: df = df.astype(int)
In [237]: df
Out[237]:
Female Male
age
78 9 12
79 13 4
80 10 7
81 8 6
82 4 5
In [238]: df = df.reset_index('age')
In [240]: df = df.melt(id_vars='age', var_name='gender', value_name='count')
In [241]: df
Out[241]:
age gender count
0 78 Female 9
1 79 Female 13
2 80 Female 10
3 81 Female 8
4 82 Female 4
5 78 Male 12
6 79 Male 4
7 80 Male 7
8 81 Male 6
9 82 Male 5
In [242]: df['age'] = df.apply(lambda s: [s['age']] * s['count'], axis=1)
In [243]: df
Out[243]:
age gender count
0 [78, 78, 78, 78, 78, 78, 78, 78, 78] Female 9
1 [79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 7... Female 13
2 [80, 80, 80, 80, 80, 80, 80, 80, 80, 80] Female 10
3 [81, 81, 81, 81, 81, 81, 81, 81] Female 8
4 [82, 82, 82, 82] Female 4
5 [78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78] Male 12
6 [79, 79, 79, 79] Male 4
7 [80, 80, 80, 80, 80, 80, 80] Male 7
8 [81, 81, 81, 81, 81, 81] Male 6
9 [82, 82, 82, 82, 82] Male 5
In [245]: df = df.explode('age')
In [249]: df['age'] = df['age'].astype(int)
In [251]: df
Out[251]:
age gender count
0 78 Female 9
0 78 Female 9
0 78 Female 9
0 78 Female 9
0 78 Female 9
.. ... ... ...
9 82 Male 5
9 82 Male 5
9 82 Male 5
9 82 Male 5
9 82 Male 5
[78 rows x 3 columns]
In [250]: df.groupby('gender')['age'].median()
Out[250]:
gender
Female 79.5
Male 80.0
Name: age, dtype: float64