如何使用 Python 删除上限和下限
How to remove upper and lower bounds with Python
我有一个包含 2 个重要的数据框 columns.One 这些重要的列是“价格”列,另一个是“数量”列。
我的数据框;
Price
Quantity
Total Quantity
5
500
4000
6
100
4000
7
400
4000
8
200
4000
9
200
4000
10
800
4000
10
200
4000
10
300
4000
10
300
4000
11
300
4000
12
300
4000
12
100
4000
13
200
4000
14
100
4000
我的代码;
#The type of 2 columns is set to float and the price column is divided by 100
data_state['Price'].apply(lambda x: float(x))
data_state['Quantity'].apply(lambda x: float(x))
data_state['Price'] = data_state['Price'] / 100
#price and quantity sorting smallest to largest
data_state = data_state.sort_values(['Price', 'Quantity'], ascending=(True, True))
#Getting the sum of the quantity column
data_state['Total Quantity'] = data_state['Quantity'].sum()
#The total quantity column is multiplied by the value of "0.15" and the part to be subtracted from the total is found.
data_state['Total Quantity Bounds'] = data_state['Total Quantity'] * 0.15
#At this stage, I need to decrease the value that I found from the smallest to the largest, from the top and bottom of the ordered quantity column. I mean; for the quantity which are at the upper and lower bounds, only the part of Quantity which falls in central 70% are included in the calculation.
#“总数量”的顶部和底部 15% 被检测为异常值并从“数量”中移除
在这个数据框下界;
下限:4000 * 0.15 = 600 数量
在这个数据帧上限;
上限:4000 * 0.15 = 600 数量
我的预期输出;
Price
Quantity
New Total Quantity
5
0
2800
6
0
2800
7
400
2800
8
200
2800
9
200
2800
10
800
2800
10
200
2800
10
300
2800
10
300
2800
11
300
2800
12
100
2800
12
0
2800
13
0
2800
14
0
2800
如上图所示,数量栏中从上到下接近“数量”数字的数字对应数字600(4000 * 0.15)。
特别是,我将之前对应价格 12 的 300 个数字减少到 100 个。
谢谢,
您可以使用 cumsum
结合 clip
计算正向和反向数组的累积和:
# get first value in Total Quantity column and multiply by desired factor
qty = df['Total Quantity'].iat[0]*0.15
# update Total Quantity column
df['Total Quantity'] -= 2*qty
## trim top
# compute the cumulated quantity and identify the value strictly lower than qty
cs = df['Quantity'].cumsum()
m = cs.lt(qty)
# select those rows and the one after (shift)
# remove the qty from the cumulated sum clipping negative values to zero
# and update the dataframe
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)
## trim bottom
# identical to above but on the reversed [::-1] array
cs = df['Quantity'][::-1].cumsum()
m = cs.lt(qty)
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)
输出:
Price Quantity Total Quantity
0 5 0 2800.0
1 6 0 2800.0
2 7 400 2800.0
3 8 200 2800.0
4 9 200 2800.0
5 10 800 2800.0
6 10 200 2800.0
7 10 300 2800.0
8 10 300 2800.0
9 11 300 2800.0
10 12 100 2800.0
11 12 0 2800.0
12 13 0 2800.0
13 14 0 2800.0
我有一个包含 2 个重要的数据框 columns.One 这些重要的列是“价格”列,另一个是“数量”列。
我的数据框;
Price | Quantity | Total Quantity |
---|---|---|
5 | 500 | 4000 |
6 | 100 | 4000 |
7 | 400 | 4000 |
8 | 200 | 4000 |
9 | 200 | 4000 |
10 | 800 | 4000 |
10 | 200 | 4000 |
10 | 300 | 4000 |
10 | 300 | 4000 |
11 | 300 | 4000 |
12 | 300 | 4000 |
12 | 100 | 4000 |
13 | 200 | 4000 |
14 | 100 | 4000 |
我的代码;
#The type of 2 columns is set to float and the price column is divided by 100
data_state['Price'].apply(lambda x: float(x))
data_state['Quantity'].apply(lambda x: float(x))
data_state['Price'] = data_state['Price'] / 100
#price and quantity sorting smallest to largest
data_state = data_state.sort_values(['Price', 'Quantity'], ascending=(True, True))
#Getting the sum of the quantity column
data_state['Total Quantity'] = data_state['Quantity'].sum()
#The total quantity column is multiplied by the value of "0.15" and the part to be subtracted from the total is found.
data_state['Total Quantity Bounds'] = data_state['Total Quantity'] * 0.15
#At this stage, I need to decrease the value that I found from the smallest to the largest, from the top and bottom of the ordered quantity column. I mean; for the quantity which are at the upper and lower bounds, only the part of Quantity which falls in central 70% are included in the calculation.
#“总数量”的顶部和底部 15% 被检测为异常值并从“数量”中移除
在这个数据框下界;
下限:4000 * 0.15 = 600 数量
在这个数据帧上限;
上限:4000 * 0.15 = 600 数量
我的预期输出;
Price | Quantity | New Total Quantity |
---|---|---|
5 | 0 | 2800 |
6 | 0 | 2800 |
7 | 400 | 2800 |
8 | 200 | 2800 |
9 | 200 | 2800 |
10 | 800 | 2800 |
10 | 200 | 2800 |
10 | 300 | 2800 |
10 | 300 | 2800 |
11 | 300 | 2800 |
12 | 100 | 2800 |
12 | 0 | 2800 |
13 | 0 | 2800 |
14 | 0 | 2800 |
如上图所示,数量栏中从上到下接近“数量”数字的数字对应数字600(4000 * 0.15)。 特别是,我将之前对应价格 12 的 300 个数字减少到 100 个。
谢谢,
您可以使用 cumsum
结合 clip
计算正向和反向数组的累积和:
# get first value in Total Quantity column and multiply by desired factor
qty = df['Total Quantity'].iat[0]*0.15
# update Total Quantity column
df['Total Quantity'] -= 2*qty
## trim top
# compute the cumulated quantity and identify the value strictly lower than qty
cs = df['Quantity'].cumsum()
m = cs.lt(qty)
# select those rows and the one after (shift)
# remove the qty from the cumulated sum clipping negative values to zero
# and update the dataframe
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)
## trim bottom
# identical to above but on the reversed [::-1] array
cs = df['Quantity'][::-1].cumsum()
m = cs.lt(qty)
df.loc[m|m.shift(), 'Quantity'] = cs.loc[m|m.shift()].sub(qty).clip(0)
输出:
Price Quantity Total Quantity
0 5 0 2800.0
1 6 0 2800.0
2 7 400 2800.0
3 8 200 2800.0
4 9 200 2800.0
5 10 800 2800.0
6 10 200 2800.0
7 10 300 2800.0
8 10 300 2800.0
9 11 300 2800.0
10 12 100 2800.0
11 12 0 2800.0
12 13 0 2800.0
13 14 0 2800.0