计算每个单元格具有多个值的列的中位数(范围)
Calculate median of column with multiple values per cell (ranges)
我有这个代码
df = pd.DataFrame({'R': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'}, 'a': {0: 1.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 3.0, 5: 2.0, 6: 3.0}, 'nv1': {0: [-1.0], 1: [-1.0], 2: [], 3: [], 4: [-2.0], 5: [-2.0, -1.0, -3.0, -1.0], 6: [-2.0, -1.0, -2.0, -1.0]}})
产生以下数据帧:
R a nv1
0 1 1.0 [-1.0]
1 2 1.0 [-1.0]
2 3 2.0 []
3 4 3.0 []
4 5 3.0 [-2.0]
5 6 2.0 [-2.0, -1.0, -3.0, -1.0]
6 7 3.0 [-2.0, -1.0, -2.0, -1.0]
我需要计算 df['nv1']
的中位数
df['med'] = median of df['nv1']
想要的输出如下
R a nv1 med
1 1.0 [-1.0] -1
2 1.0 [-1.0] -1
3 2.0 []
4 3.0 []
5 3.0 [-2.0] -2
6 2.0 [-2.0, -1.0, -3.0, -1.0] -1.5
7 3.0 [-2.0, -1.0, -2.0, -1.0] -1.5
我分别尝试了下面的两行代码,但是我 运行 出错了:
df['nv1'] = pd.to_numeric(df['nv1'],errors = 'coerce')
df['med'] = df['nv1'].median()
使用np.median
:
df['med'] = df['nv1'].apply(np.median)
输出:
>>> df
R a nv1 med
0 1 1.0 [-1.0] -1.0
1 2 1.0 [-1.0] -1.0
2 3 2.0 [] NaN
3 4 3.0 [] NaN
4 5 3.0 [-2.0] -2.0
5 6 2.0 [-2.0, -1.0, -3.0, -1.0] -1.5
6 7 3.0 [-2.0, -1.0, -2.0, -1.0] -1.5
或者:
df['med'] = df['nv1'].explode().dropna().groupby(level=0).median()
我有这个代码
df = pd.DataFrame({'R': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'}, 'a': {0: 1.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 3.0, 5: 2.0, 6: 3.0}, 'nv1': {0: [-1.0], 1: [-1.0], 2: [], 3: [], 4: [-2.0], 5: [-2.0, -1.0, -3.0, -1.0], 6: [-2.0, -1.0, -2.0, -1.0]}})
产生以下数据帧:
R a nv1
0 1 1.0 [-1.0]
1 2 1.0 [-1.0]
2 3 2.0 []
3 4 3.0 []
4 5 3.0 [-2.0]
5 6 2.0 [-2.0, -1.0, -3.0, -1.0]
6 7 3.0 [-2.0, -1.0, -2.0, -1.0]
我需要计算 df['nv1']
df['med'] = median of df['nv1']
想要的输出如下
R a nv1 med
1 1.0 [-1.0] -1
2 1.0 [-1.0] -1
3 2.0 []
4 3.0 []
5 3.0 [-2.0] -2
6 2.0 [-2.0, -1.0, -3.0, -1.0] -1.5
7 3.0 [-2.0, -1.0, -2.0, -1.0] -1.5
我分别尝试了下面的两行代码,但是我 运行 出错了:
df['nv1'] = pd.to_numeric(df['nv1'],errors = 'coerce')
df['med'] = df['nv1'].median()
使用np.median
:
df['med'] = df['nv1'].apply(np.median)
输出:
>>> df
R a nv1 med
0 1 1.0 [-1.0] -1.0
1 2 1.0 [-1.0] -1.0
2 3 2.0 [] NaN
3 4 3.0 [] NaN
4 5 3.0 [-2.0] -2.0
5 6 2.0 [-2.0, -1.0, -3.0, -1.0] -1.5
6 7 3.0 [-2.0, -1.0, -2.0, -1.0] -1.5
或者:
df['med'] = df['nv1'].explode().dropna().groupby(level=0).median()