如何将数字四舍五入到指定的上限或下限?
How to round a number to the specified upper or lower bond?
我正在处理一个数据集,我在该数据集上有某些值需要四舍五入到 lower/upper 边界。
例如。如果我希望上限为 9 并下限为 3 并且我们有这样的数字 -
[ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
我们希望将列表四舍五入为 3 或 9,例如 -
[ 9,
9,
9,
3,
3 ]
我知道我们可以用一种很好的旧方式来做到这一点,比如在数组中迭代并找到差异,然后得到最接近的那个。
我的方法代码:
for i in the_list[:]:
three = abs(3-the_list[i])
nine = abs(9-the_list[i])
if three < nine:
the_list[i] = three
else:
the_list[i] = nine
我想知道 是否有内置于 python 中的 快速而肮脏的 方法,例如:
hey_bound = round_the_num(number, bound_1, bound_2)
我知道我们可以 my-approach-code
但我非常确定这已经以更好的方式实现了,我试图找到它但没有找到它,我们在这里.
解决此问题的任何猜测或直接链接都将是惊人的。
您可以通过找到中点并检查列表中的每个数字位于中点的哪一侧来进行概括
def round_the_list(list, bound_1, bound_2):
mid = (bound_1+bound_2)/2
for i in range(len(list)):
if list[i] > mid: # or >= depending on your rounding decision
list[i] = bound_2
else:
list[i] = bound_1
也许您可以编写一个函数并在列表理解中使用它。
def return_bound(x, l, h):
low = abs(x - l)
high = abs(x - h)
if low < high:
return l
else:
return h
测试:
>>> mylist = [7.453511737983394, 8.10917072790058, 6.2377799380575, 5.225853201122676, 4.067932296134156]
>>> [return_bound(x, 3, 9) for x in mylist]
[9, 9, 9, 3, 3]
编辑:
到目前为止,我认为最好的方法是使用 numpy(避免 "manual" 循环)并简单计算 the_list
和两个边界之间的差异数组(因此这里没有昂贵的乘法),然后仅有条件地添加一个或另一个,具体取决于哪个较小:
import numpy as np
the_list = np.array([ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ])
dhi = 9 - the_list
dlo = 3 - the_list
idx = dhi + dlo < 0
the_rounded = the_list + np.where(idx, dhi, dlo)
# array([9., 9., 9., 3., 3.])
我将对无偏移归一化列表应用舍入函数,然后缩减并添加偏移量:
import numpy as np
the_list = np.array([ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ])
hi = 9
lo = 3
dlt = hi - lo
the_rounded = np.round((the_list - lo)/dlt) * dlt + lo
# [9. 9. 9. 3. 3.]
使用内置 min
函数的单行列表理解,通过修改键参数来寻找绝对差异
upper_lower_bound_list=[3,9]
myNumberlist=[ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
列表理解
[min(upper_lower_bound_list, key=lambda x:abs(x-myNumber)) for myNumber in myNumberlist]
输出
[9, 9, 9, 3, 3]
另一个使用列表理解和 lambda 函数的选项:
round_the_num = lambda list, upper, lower: [upper if x > (upper + lower) / 2 else lower for x in list]
round_the_num(l, 9, 3)
您可以编写一个执行列表理解的自定义函数,例如:
lst = [ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
def return_the_num(l, lst, h):
return [l if abs(l-x) < abs(h-x) else h for x in lst]
print(return_the_num(3, lst, 9))
# [9, 9, 9, 3, 3]
我真的很喜欢@AbhishekPatel 关于与中点进行比较的想法。但是我会使用结果作为边界元组的索引将它放入 LC 中:
the_list = [ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
hi = 9
lo = 3
mid = (hi + lo) / 2
[(lo, hi)[mid < v] for v in the_list]
# [9, 9, 9, 3, 3]
...但这比 numpy 方法慢 15 倍以上。
但是,这里可以处理大于 hi
或小于 lo
.
的数字
... 但这又仅适用于 100000 个条目列表。对于 OP 发布的原始列表,两个变体非常接近......
时间比较 可用答案
我的解释是:
从性能的角度来看,对于较小的列表,您应该使用 Abhishek Patel 或 Carles Mitjans。
对于包含几十个甚至更多值的列表,numpy 数组然后有条件地添加具有较小绝对值的差异似乎是最快的解决方案。
用于时序比较的代码:
import timeit
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
rep = 5
timings = dict()
for n in range(7):
print(f'N = 10^{n}')
N = 10**n
setup = f'''import numpy as np\nthe_list = np.random.random({N})*6+3\nhi = 9\nlo = 3\ndlt = hi - lo\nmid = (hi + lo) / 2\ndef return_the_num(l, lst, h):\n return [l if abs(l-x) < abs(h-x) else h for x in lst]'''
fct = 'np.round((the_list - lo)/dlt) * dlt + lo'
t = timeit.Timer(fct, setup=setup)
timings['SpghttCd_np'] = timings.get('SpghttCd_np', []) + [np.min(t.repeat(repeat=rep, number=1))]
fct = 'return_the_num(3, the_list, 9)'
t = timeit.Timer(fct, setup=setup)
timings['Austin'] = timings.get('Austin', []) + [np.min(t.repeat(repeat=rep, number=1))]
fct = '[(lo, hi)[mid < v] for v in the_list]'
t = timeit.Timer(fct, setup=setup)
timings['SpghttCd_lc'] = timings.get('SpghttCd_lc', []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\nround_the_num = lambda list, upper, lower: [upper if x > (upper + lower) / 2 else lower for x in list]'
fct = 'round_the_num(the_list, 9, 3)'
t = timeit.Timer(fct, setup=setup)
timings['Carles Mitjans'] = timings.get('Carles Mitjans', []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\nupper_lower_bound_list=[3,9]'
fct = '[min(upper_lower_bound_list, key=lambda x:abs(x-myNumber)) for myNumber in the_list]'
t = timeit.Timer(fct, setup=setup)
timings['mad_'] = timings.get('mad_', []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\ndef return_bound(x, l, h):\n low = abs(x - l)\n high = abs(x - h)\n if low < high:\n return l\n else:\n return h'
fct = '[return_bound(x, 3, 9) for x in the_list]'
t = timeit.Timer(fct, setup=setup)
timings["Scratch'N'Purr"] = timings.get("Scratch'N'Purr", []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\ndef round_the_list(list, bound_1, bound_2):\n\tmid = (bound_1+bound_2)/2\n\tfor i in range(len(list)):\n\t\tif list[i] > mid:\n\t\t\tlist[i] = bound_2\n\t\telse:\n\t\t\tlist[i] = bound_1'
fct = 'round_the_list(the_list, 3, 9)'
t = timeit.Timer(fct, setup=setup)
timings["Abhishek Patel"] = timings.get("Abhishek Patel", []) + [np.min(t.repeat(repeat=rep, number=1))]
fct = 'dhi = 9 - the_list\ndlo = 3 - the_list\nidx = dhi + dlo < 0\nthe_list + np.where(idx, dhi, dlo)'
t = timeit.Timer(fct, setup=setup)
timings["SpghttCd_where"] = timings.get("SpghttCd_where", []) + [np.min(t.repeat(repeat=rep, number=1))]
print('done')
df = pd.DataFrame(timings, 10**np.arange(n+1))
ax = df.plot(logx=True, logy=True)
ax.set_xlabel('length of the list')
ax.set_ylabel('seconds to run')
ax.get_lines()[-1].set_c('g')
plt.legend()
print(df)
我正在处理一个数据集,我在该数据集上有某些值需要四舍五入到 lower/upper 边界。
例如。如果我希望上限为 9 并下限为 3 并且我们有这样的数字 -
[ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
我们希望将列表四舍五入为 3 或 9,例如 -
[ 9,
9,
9,
3,
3 ]
我知道我们可以用一种很好的旧方式来做到这一点,比如在数组中迭代并找到差异,然后得到最接近的那个。
我的方法代码:
for i in the_list[:]:
three = abs(3-the_list[i])
nine = abs(9-the_list[i])
if three < nine:
the_list[i] = three
else:
the_list[i] = nine
我想知道 是否有内置于 python 中的 快速而肮脏的 方法,例如:
hey_bound = round_the_num(number, bound_1, bound_2)
我知道我们可以 my-approach-code
但我非常确定这已经以更好的方式实现了,我试图找到它但没有找到它,我们在这里.
解决此问题的任何猜测或直接链接都将是惊人的。
您可以通过找到中点并检查列表中的每个数字位于中点的哪一侧来进行概括
def round_the_list(list, bound_1, bound_2):
mid = (bound_1+bound_2)/2
for i in range(len(list)):
if list[i] > mid: # or >= depending on your rounding decision
list[i] = bound_2
else:
list[i] = bound_1
也许您可以编写一个函数并在列表理解中使用它。
def return_bound(x, l, h):
low = abs(x - l)
high = abs(x - h)
if low < high:
return l
else:
return h
测试:
>>> mylist = [7.453511737983394, 8.10917072790058, 6.2377799380575, 5.225853201122676, 4.067932296134156]
>>> [return_bound(x, 3, 9) for x in mylist]
[9, 9, 9, 3, 3]
编辑:
到目前为止,我认为最好的方法是使用 numpy(避免 "manual" 循环)并简单计算 the_list
和两个边界之间的差异数组(因此这里没有昂贵的乘法),然后仅有条件地添加一个或另一个,具体取决于哪个较小:
import numpy as np
the_list = np.array([ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ])
dhi = 9 - the_list
dlo = 3 - the_list
idx = dhi + dlo < 0
the_rounded = the_list + np.where(idx, dhi, dlo)
# array([9., 9., 9., 3., 3.])
我将对无偏移归一化列表应用舍入函数,然后缩减并添加偏移量:
import numpy as np
the_list = np.array([ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ])
hi = 9
lo = 3
dlt = hi - lo
the_rounded = np.round((the_list - lo)/dlt) * dlt + lo
# [9. 9. 9. 3. 3.]
使用内置 min
函数的单行列表理解,通过修改键参数来寻找绝对差异
upper_lower_bound_list=[3,9]
myNumberlist=[ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
列表理解
[min(upper_lower_bound_list, key=lambda x:abs(x-myNumber)) for myNumber in myNumberlist]
输出
[9, 9, 9, 3, 3]
另一个使用列表理解和 lambda 函数的选项:
round_the_num = lambda list, upper, lower: [upper if x > (upper + lower) / 2 else lower for x in list]
round_the_num(l, 9, 3)
您可以编写一个执行列表理解的自定义函数,例如:
lst = [ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
def return_the_num(l, lst, h):
return [l if abs(l-x) < abs(h-x) else h for x in lst]
print(return_the_num(3, lst, 9))
# [9, 9, 9, 3, 3]
我真的很喜欢@AbhishekPatel 关于与中点进行比较的想法。但是我会使用结果作为边界元组的索引将它放入 LC 中:
the_list = [ 7.453511737983394,
8.10917072790058,
6.2377799380575,
5.225853201122676,
4.067932296134156 ]
hi = 9
lo = 3
mid = (hi + lo) / 2
[(lo, hi)[mid < v] for v in the_list]
# [9, 9, 9, 3, 3]
...但这比 numpy 方法慢 15 倍以上。
但是,这里可以处理大于 hi
或小于 lo
.
的数字
... 但这又仅适用于 100000 个条目列表。对于 OP 发布的原始列表,两个变体非常接近......
时间比较 可用答案
我的解释是:
从性能的角度来看,对于较小的列表,您应该使用 Abhishek Patel 或 Carles Mitjans。
对于包含几十个甚至更多值的列表,numpy 数组然后有条件地添加具有较小绝对值的差异似乎是最快的解决方案。
用于时序比较的代码:
import timeit
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
rep = 5
timings = dict()
for n in range(7):
print(f'N = 10^{n}')
N = 10**n
setup = f'''import numpy as np\nthe_list = np.random.random({N})*6+3\nhi = 9\nlo = 3\ndlt = hi - lo\nmid = (hi + lo) / 2\ndef return_the_num(l, lst, h):\n return [l if abs(l-x) < abs(h-x) else h for x in lst]'''
fct = 'np.round((the_list - lo)/dlt) * dlt + lo'
t = timeit.Timer(fct, setup=setup)
timings['SpghttCd_np'] = timings.get('SpghttCd_np', []) + [np.min(t.repeat(repeat=rep, number=1))]
fct = 'return_the_num(3, the_list, 9)'
t = timeit.Timer(fct, setup=setup)
timings['Austin'] = timings.get('Austin', []) + [np.min(t.repeat(repeat=rep, number=1))]
fct = '[(lo, hi)[mid < v] for v in the_list]'
t = timeit.Timer(fct, setup=setup)
timings['SpghttCd_lc'] = timings.get('SpghttCd_lc', []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\nround_the_num = lambda list, upper, lower: [upper if x > (upper + lower) / 2 else lower for x in list]'
fct = 'round_the_num(the_list, 9, 3)'
t = timeit.Timer(fct, setup=setup)
timings['Carles Mitjans'] = timings.get('Carles Mitjans', []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\nupper_lower_bound_list=[3,9]'
fct = '[min(upper_lower_bound_list, key=lambda x:abs(x-myNumber)) for myNumber in the_list]'
t = timeit.Timer(fct, setup=setup)
timings['mad_'] = timings.get('mad_', []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\ndef return_bound(x, l, h):\n low = abs(x - l)\n high = abs(x - h)\n if low < high:\n return l\n else:\n return h'
fct = '[return_bound(x, 3, 9) for x in the_list]'
t = timeit.Timer(fct, setup=setup)
timings["Scratch'N'Purr"] = timings.get("Scratch'N'Purr", []) + [np.min(t.repeat(repeat=rep, number=1))]
setup += '\ndef round_the_list(list, bound_1, bound_2):\n\tmid = (bound_1+bound_2)/2\n\tfor i in range(len(list)):\n\t\tif list[i] > mid:\n\t\t\tlist[i] = bound_2\n\t\telse:\n\t\t\tlist[i] = bound_1'
fct = 'round_the_list(the_list, 3, 9)'
t = timeit.Timer(fct, setup=setup)
timings["Abhishek Patel"] = timings.get("Abhishek Patel", []) + [np.min(t.repeat(repeat=rep, number=1))]
fct = 'dhi = 9 - the_list\ndlo = 3 - the_list\nidx = dhi + dlo < 0\nthe_list + np.where(idx, dhi, dlo)'
t = timeit.Timer(fct, setup=setup)
timings["SpghttCd_where"] = timings.get("SpghttCd_where", []) + [np.min(t.repeat(repeat=rep, number=1))]
print('done')
df = pd.DataFrame(timings, 10**np.arange(n+1))
ax = df.plot(logx=True, logy=True)
ax.set_xlabel('length of the list')
ax.set_ylabel('seconds to run')
ax.get_lines()[-1].set_c('g')
plt.legend()
print(df)