如何使用 Python 删除上限和下限

Question

我有一个包含 2 个重要的数据框 columns.One 这些重要的列是“价格”列，另一个是“数量”列。

我的数据框；

Price	Quantity	Total Quantity
5	500	4000
6	100	4000
7	400	4000
8	200	4000
9	200	4000
10	800	4000
10	200	4000
10	300	4000
10	300	4000
11	300	4000
12	300	4000
12	100	4000
13	200	4000
14	100	4000

我的代码；

#The type of 2 columns is set to float and the price column is divided by 100
data_state['Price'].apply(lambda x: float(x))
data_state['Quantity'].apply(lambda x: float(x))
data_state['Price'] = data_state['Price'] /  100 

#price and quantity sorting smallest to largest
data_state = data_state.sort_values(['Price', 'Quantity'], ascending=(True, True))

#Getting the sum of the quantity column
data_state['Total Quantity'] = data_state['Quantity'].sum()

#The total quantity column is multiplied by the value of "0.15" and the part to be subtracted from the total is found.
data_state['Total Quantity Bounds'] = data_state['Total Quantity'] * 0.15

#At this stage, I need to decrease the value that I found from the smallest to the largest, from the top and bottom of the ordered quantity column. I mean; for the quantity which are at the upper and lower bounds, only the part of Quantity which falls in central 70% are included in the calculation.

#“总数量”的顶部和底部 15% 被检测为异常值并从“数量”中移除

在这个数据框下界；

下限：4000 * 0.15 = 600 数量

在这个数据帧上限；

上限：4000 * 0.15 = 600 数量

我的预期输出；

Price	Quantity	New Total Quantity
5	0	2800
6	0	2800
7	400	2800
8	200	2800
9	200	2800
10	800	2800
10	200	2800
10	300	2800
10	300	2800
11	300	2800
12	100	2800
12	0	2800
13	0	2800
14	0	2800

如上图所示，数量栏中从上到下接近“数量”数字的数字对应数字600（4000 * 0.15）。特别是，我将之前对应价格 12 的 300 个数字减少到 100 个。

谢谢，

Answer 1

您可以使用 cumsum 结合 clip 计算正向和反向数组的累积和：

# get first value in Total Quantity column and multiply by desired factor
qty = df['Total Quantity'].iat[0]*0.15

# update Total Quantity column
df['Total Quantity'] -= 2*qty

## trim top

# compute the cumulated quantity and identify the value strictly lower than qty
cs = df['Quantity'].cumsum()
m = cs.lt(qty)

# select those rows and the one after (shift)
# remove the qty from the cumulated sum clipping negative values to zero
# and update the dataframe
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)


## trim bottom
# identical to above but on the reversed [::-1] array

cs = df['Quantity'][::-1].cumsum()
m = cs.lt(qty)
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)

输出：

    Price  Quantity  Total Quantity
0       5         0          2800.0
1       6         0          2800.0
2       7       400          2800.0
3       8       200          2800.0
4       9       200          2800.0
5      10       800          2800.0
6      10       200          2800.0
7      10       300          2800.0
8      10       300          2800.0
9      11       300          2800.0
10     12       100          2800.0
11     12         0          2800.0
12     13         0          2800.0
13     14         0          2800.0

如何使用 Python 删除上限和下限

How to remove upper and lower bounds with Python

python

dataframe

pandas

columnsorting