根据 floor/ceil 条件,将 pandas 列值舍入或截断为 2 位小数

Either round or truncate pandas column values to 2 decimals, based on floor/ceil conditions

我有一个数据框需要根据以下逻辑具体转换为两位小数分辨率:

我遇到的主要问题是实际上看到这些新 rounded/truncated 值替换了数据框中的原始值。

示例数据框:

import math
import pandas as pd  

test_df = pd.DataFrame({'weights': ['25.2524%', '25.7578%', '35.5012%', '13.5000%', 
    "50.8782%", "10.2830%", "5.5050%", "30.5555%", "20.7550%"]})

# .. which creates:

   | weights |
|0 | 25.2524%|
|1 | 25.7578%|
|2 | 35.5012%|
|3 | 13.5000%|
|4 | 50.8782%|
|5 | 10.2830%|
|6 |  5.5050%|
|7 | 30.5555%|
|8 | 20.7550%|

定义截断函数,以及配置十进制分辨率的函数:

def truncate_decimals(target_allocation, two_decimal_places) -> float:
    decimal_exponent = 10.0 ** two_decimal_places
    return math.trunc(decimal_exponent * target_allocation) / decimal_exponent

def decimals(df):
    df["weights"] = df["weights"].str.rstrip("%").astype("float")
    decimal_precision = 2
    for x in df["weights"]:
        if x > math.floor(x) + 0.5:
            x = round(x, decimal_precision)
            print("This value is being rounded", x)
            df.loc[(df.weights == x), ('weights')] = x
        elif x < math.ceil(x) - 0.5:
            y = truncate_decimals(x, decimal_precision)
            print("This value is being truncated", y)
            df.loc[(df.weights == x), ('weights')] = y
        else:
            pass
            print("This value does not meet one of the above conditions", round(x, decimal_precision))

    return df


decimals(test_df)

预期输出:

This value is being truncated 25.25
This value is being rounded 25.76
This value is being rounded 35.5
This value does not meet one of the above conditions 13.5
This value is being rounded 50.88
This value is being truncated 10.28
This value is being rounded 5.5
This value is being rounded 30.56
This value is being rounded 20.75

   | weights|
|0 | 25.25  |
|1 | 25.76  |
|2 | 35.5   |
|3 | 13.5   |
|4 | 50.88  |
|5 | 10.28  |
|6 |  5.5   |
|7 | 30.56  |
|8 | 20.75  |

当前输出:

The current value is being truncated 25.25

   | weights |
|0 | 25.2524%|
|1 | 25.7578%|
|2 | 35.5012%|
|3 | 13.5000%|
|4 | 50.8782%|
|5 | 10.2830%|
|6 |  5.5050%|
|7 | 30.5555%|
|8 | 20.7550%|

另一种方法是定义一个函数,将上述规则应用于通用数字,然后将其应用于列中的每个 权重

像这样

import math
import pandas as pd  

test_df = pd.DataFrame({'weights': ['25.2524%', '25.7578%', '35.5012%', '13.5000%', 
    "50.8782%", "10.2830%", "5.5050%", "30.5555%", "20.7550%"]})

def truncate_decimals(target_allocation, two_decimal_places) -> float:
    decimal_exponent = 10.0 ** two_decimal_places
    return math.trunc(decimal_exponent * target_allocation) / decimal_exponent

def rule(number, decimal_precision=2):
    number = float(number.rstrip("%"))
    
    if number > math.floor(number) + 0.5:
        number = round(number, decimal_precision)
        print("This value is being rounded", number)
    
    elif number < math.ceil(number) - 0.5:
        number = truncate_decimals(number, decimal_precision)
        print("This value is being truncated", number)
       
    else:
        print("This value does not meet one of the above conditions", round(number, decimal_precision))
    
    return number

test_df['rounded'] = test_df.weights.apply(rule)

pandas .round() 函数 已经在一行中完成了所有这些工作。不要重新发明轮子。

>>> tdf['weights'].round(2)

0    25.25
1    25.76
2    35.50
3    13.50
4    50.88
5    10.28
6     5.50
7    30.56
8    20.76
  • 如果您想消除尾随的“0”,例如'13.50',这只是字符串格式,参见 .format()

您甚至不需要使用函数 来获取浮点数的小数部分和整数部分。

  • (它在 numpy.modfmath.modf 中;使用 numpy 版本,因为它是矢量化的,所以你可以在整个系列中调用一次,不会做很多单独的、缓慢的 C 调用就像 math.modfmath.ceilmath.floor 一样)

因此,例如,如果您想获得一系列(浮点数,整数)部分的元组:

import numpy as np
pd.Series(zip(*np.modf(tdf['weights'])))

0    (0.2524000000000015, 25.0)
1    (0.7577999999999996, 25.0)
2    (0.5011999999999972, 35.0)
3                   (0.5, 13.0)
4    (0.8781999999999996, 50.0)
5    (0.2829999999999995, 10.0)
6     (0.5049999999999999, 5.0)
7    (0.5554999999999986, 30.0)
8     (0.754999999999999, 20.0)

注意:首先您必须将百分比字符串转换为浮点数:

tdf["weights"] = tdf["weights"].str.rstrip("%").astype("float")