通过将数据分离到 bin 来分配中值

Question

我有一个数据框，我想将其分成 bin 并为每个 bin 分配该 bin 中值的中值。

   POA   Egrid           
   200   1.17
   205   0.63
   275   1.08
   325   1.22
   350   0.57

结果应该是这样的

   POA       Egrid           
 (200,300)   Median of (1.17,0.63,1.08)
 (300,400)   Median of (1.22,0.57)

我试着写了两个循环，但想不出中间部分。任何帮助都会很好。

Answer 1

使用：pd.cut和.groupby和.transform

import pandas as pd
df['POA'] = df['POA'].astype(int)
df['POA'] = pd.cut(df['POA'], [0,99,199, 299, 399], include_lowest=True)
df['Egrid'] = df.groupby('POA')['Egrid'].transform('median')
df = df.drop_duplicates()
df

编辑：

有一个带有 pd.cut 的标志，即 right=False。如果我们添加这个，那么类别会更清晰，而不是去 99，你可以去 100。

import pandas as pd
df['POA'] = df['POA'].astype(int)
df['POA'] = pd.cut(df['POA'], [0,100,200, 300,400], include_lowest=True, right=False)
df['Egrid'] = df.groupby('POA')['Egrid'].transform('median')
df = df.drop_duplicates()
df

输出：

    POA         Egrid
0   [200, 300)  1.080
1   [200, 300)  1.080
2   [200, 300)  1.080
3   [300, 400)  0.895
4   [300, 400)  0.895

Answer 2

这当然不是最有效的方法，但这会奏效！

首先，让我们重新创建一个类似的设置：

import numpy as np
import pandas as pd

# make a DataFrame like yours
df = pd.DataFrame([[200, 1.17], [205, 0.63], [275, 1.08], [325, 1.22], [350, 0.57]], columns=["POA", "Egrid"])

然后，让我们添加中位数：

# first, define a list of possible ranges from which you want the medians
list_of_ranges = [(200, 300), (300, 400)]

# initialize a column named "Median"
df["Median"] = [0]*df.shape[0]

# apply median to the desired ranges
for a, b in list_of_ranges:
    
    # calculate the median from the desired subset of the dataframe
    median = df[(df['POA'] >= a) & (df['POA'] < b)]["Egrid"].median()
    
    # apply the value where the condition is respected
    df.loc[(df['POA'] >= a) & (df['POA'] < b), 'Median'] = median

不清楚的请告知

Answer 3

做

s=df.groupby(pd.cut(df.POA,[100,200,300])).Egrid.median().reset_index()
          POA  Egrid
0  (100, 200]  1.170
1  (200, 300]  0.855

Answer 4

import pandas as pd
import numpy as np

# Create the dataframe
d = {'POA':[200,205,275,325,350], 'Egrid':[1.17,0.63,1.08,1.22,0.57]}
df = pd.DataFrame(data=d)

# Create bins to group by
bins = [100,200,300,400,500,600,700,800,900,1000]

# For loop to assign each POA to a bin
for bin in bins:
    upper_bin = bin + 100
    df.loc[(df['POA'] >= bin) & (df['POA'] < upper_bin), 'Bin'] = f'{bin} to {upper_bin}'

# Create a pandas pivot_table to summarize the results
# Displays each bin and its median value
df2 = pd.pivot_table(df, index=['Bin'], values=['Egrid'], aggfunc=np.median)
print(df2)

通过将数据分离到 bin 来分配中值

Assign median values by separating data to bins

python

median

pandas