查找一行中是否有n个数据点小于某个数

Question

我正在使用 Python 中的频谱，并且我已经为该频谱拟合了一条线。我想要一个代码，可以检测一行中的光谱是否有 10 个数据点小于拟合线。有谁知道如何简单快捷地做到这一点？

我目前有这样的东西：

count = 0
for i in range(lowerbound, upperbound):
    if spectrum[i] < fittedline[i]
        count += 1
    if count > 15:
        *do whatever*

如果我将第一个 if 语句行更改为：

if spectrum[i] < fittedline[i] & spectrum[i+1] < fittedline[i+1] & so on

我确定该算法会起作用，但是如果我希望用户输入一个数字来表示一行中有多少数据点必须小于拟合线？

Answer 1

您的尝试非常接近成功！对于连续积分，如果有一个积分不满足您的条件，您需要做的就是重新计算。

num_points = int(input("How many points must be less than the fitted line? "))

count = 0
for i in range(lowerbound, upperbound):
    if spectrum[i] < fittedline[i]:
        count += 1
    else: # If the current point is NOT below the threshold, reset the count
        count = 0

    if count >= num_points:
        print(f"{count} consecutive points found at location {i-count+1}-{i}!")

我们来测试一下：

lowerbound = 0
upperbound = 10

num_points = 5

spectrum = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
fittedline = [1, 2, 10, 10, 10, 10, 10, 8, 9, 10]

运行具有这些值的代码给出：

5 consecutive points found at location 2-6!

Answer 2

我的建议是在开发临时功能之前研究和使用现有库

在这种情况下，一些超级聪明的人开发了数值 python 库 numpy。这个库广泛用于科学项目，有大量有用的功能实现 tested 和 optimized

您的需求可以用以下行来满足：

number_of_points = (np.array(spectrum) < np.array(fittedline)).sum()

但是让我们一步步来：

spectrum = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
fittedline = [1, 2, 10, 10, 10, 10, 10, 8, 9, 10]

# Import numerical python module
import numpy as np

# Convert your lists to numpy arrays
spectrum_array = np.array(spectrum)
gittedline_array = np.array(fittedline)

# Substract fitted line to spectrum
difference = spectrum_array - gittedline_array
#>>> array([ 0,  0, -7, -6, -5, -4, -3,  0,  0,  0])

# Identify points where condition is met
condition_check_array = difference < 0.0
# >>> array([False, False,  True,  True,  True,  True,  True, False, False, False])

# Get the number of points where condition is met
number_of_points = condition_check_array.sum()
# >>> 5

# Get index of points where condition is met
index_of_points = np.where(difference < 0)
# >>> (array([2, 3, 4, 5, 6], dtype=int64),)

print(f"{number_of_points} points found at location {index_of_points[0][0]}-{index_of_points[0][-1]}!")

# Now same functionality in a simple function
def get_point_count(spectrum, fittedline):  
    return (np.array(spectrum) < np.array(fittedline)).sum()

get_point_count(spectrum, fittedline)

现在让我们考虑一下，您的频谱中没有 10 个点，而是 10M。代码效率是需要考虑的关键因素，numpy 可以在这方面提供帮助：

number_of_samples = 1000000
spectrum = [1] * number_of_samples
# >>> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
fittedline = [0] * number_of_samples
fittedline[2:7] =[2] * 5
# >>> [0, 0, 2, 2, 2, 2, 2, 0, 0, 0, ...]

# With numpy
start_time = time.time()
number_of_points = (np.array(spectrum) < np.array(fittedline)).sum()
numpy_time = time.time() - start_time
print("--- %s seconds ---" % (numpy_time))


# With ad hoc loop and ifs
start_time = time.time()
count=0
for i in range(0, len(spectrum)):
    if spectrum[i] < fittedline[i]:
        count += 1
    else: # If the current point is NOT below the threshold, reset the count
        count = 0
adhoc_time = time.time() - start_time
print("--- %s seconds ---" % (adhoc_time))

print("Ad hoc is {:3.1f}% slower".format(100 * (adhoc_time / numpy_time - 1)))

number_of_samples = 1000000
spectrum = [1] * number_of_samples
# >>> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
fittedline = [0] * number_of_samples
fittedline[2:7] =[2] * 5
# >>> [0, 0, 2, 2, 2, 2, 2, 0, 0, 0, ...]

# With numpy
start_time = time.time()
number_of_points = (np.array(spectrum) < np.array(fittedline)).sum()
numpy_time = time.time() - start_time
print("--- %s seconds ---" % (numpy_time))


# With ad hoc loop and ifs
start_time = time.time()
count=0
for i in range(0, len(spectrum)):
    if spectrum[i] < fittedline[i]:
        count += 1
    else: # If the current point is NOT below the threshold, reset the count
        count = 0
adhoc_time = time.time() - start_time
print("--- %s seconds ---" % (adhoc_time))

print("Ad hoc is {:3.1f}% slower".format(100 * (adhoc_time / numpy_time - 1)))

>>>--- 0.20999646186828613 seconds ---
>>>--- 0.28800177574157715 seconds ---
>>>Ad hoc is 37.1% slower

查找一行中是否有n个数据点小于某个数

Finding if there are n data points in a row that are less than a certain number

python

spectrum