如何使用 Python 删除上限和下限

How to remove upper and lower bounds with Python

我有一个包含 2 个重要的数据框 columns.One 这些重要的列是“价格”列,另一个是“数量”列。

我的数据框;

Price Quantity Total Quantity
5 500 4000
6 100 4000
7 400 4000
8 200 4000
9 200 4000
10 800 4000
10 200 4000
10 300 4000
10 300 4000
11 300 4000
12 300 4000
12 100 4000
13 200 4000
14 100 4000

我的代码;

#The type of 2 columns is set to float and the price column is divided by 100
data_state['Price'].apply(lambda x: float(x))
data_state['Quantity'].apply(lambda x: float(x))
data_state['Price'] = data_state['Price'] /  100 

#price and quantity sorting smallest to largest
data_state = data_state.sort_values(['Price', 'Quantity'], ascending=(True, True))

#Getting the sum of the quantity column
data_state['Total Quantity'] = data_state['Quantity'].sum()

#The total quantity column is multiplied by the value of "0.15" and the part to be subtracted from the total is found.
data_state['Total Quantity Bounds'] = data_state['Total Quantity'] * 0.15

#At this stage, I need to decrease the value that I found from the smallest to the largest, from the top and bottom of the ordered quantity column. I mean; for the quantity which are at the upper and lower bounds, only the part of Quantity which falls in central 70% are included in the calculation.

#“总数量”的顶部和底部 15% 被检测为异常值并从“数量”中移除

在这个数据框下界;

下限:4000 * 0.15 = 600 数量

在这个数据帧上限;

上限:4000 * 0.15 = 600 数量

我的预期输出;

Price Quantity New Total Quantity
5 0 2800
6 0 2800
7 400 2800
8 200 2800
9 200 2800
10 800 2800
10 200 2800
10 300 2800
10 300 2800
11 300 2800
12 100 2800
12 0 2800
13 0 2800
14 0 2800

如上图所示,数量栏中从上到下接近“数量”数字的数字对应数字600(4000 * 0.15)。 特别是,我将之前对应价格 12 的 300 个数字减少到 100 个。

谢谢,

您可以使用 cumsum 结合 clip 计算正向和反向数组的累积和:

# get first value in Total Quantity column and multiply by desired factor
qty = df['Total Quantity'].iat[0]*0.15

# update Total Quantity column
df['Total Quantity'] -= 2*qty

## trim top

# compute the cumulated quantity and identify the value strictly lower than qty
cs = df['Quantity'].cumsum()
m = cs.lt(qty)

# select those rows and the one after (shift)
# remove the qty from the cumulated sum clipping negative values to zero
# and update the dataframe
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)


## trim bottom
# identical to above but on the reversed [::-1] array

cs = df['Quantity'][::-1].cumsum()
m = cs.lt(qty)
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)

输出:

    Price  Quantity  Total Quantity
0       5         0          2800.0
1       6         0          2800.0
2       7       400          2800.0
3       8       200          2800.0
4       9       200          2800.0
5      10       800          2800.0
6      10       200          2800.0
7      10       300          2800.0
8      10       300          2800.0
9      11       300          2800.0
10     12       100          2800.0
11     12         0          2800.0
12     13         0          2800.0
13     14         0          2800.0