Python-计算一组数据的直方图
Python-compute the histogram of a set of data
下面的 Python 函数用于计算数据的直方图,具有相等大小的 bin。我想得到正确的结果
[1, 6, 4, 6]
但是在我运行代码之后,它得到了结果
[7, 12, 17, 17]
这是不正确的。有人知道怎么解决吗?
# Computes the histogram of a set of data
def histogram(data, num_bins):
# Find what range the data spans, and use it to calculate the bin size.
span = max(data) - min(data)
bin_size = span / num_bins
# Calculate the thresholds for each bin.
thresholds = [0] * num_bins
for i in range(num_bins):
thresholds[i] += bin_size * (i+1)
# Compute the histogram
counts = [0] * num_bins
for datum in data:
# Increment the count of the bin that the datum falls in
for bin_index, threshold in enumerate(thresholds):
if datum <= threshold:
counts[bin_index] += 1
return counts
# Some random data
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
print("Correct result:\t" + str([1, 6, 4, 6]))
print("Your result:\t" + str(histogram(data, num_bins=4)))
如果要查找直方图,请使用 numpy
import numpy as np
np.histogram([-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9],4)
只有你有两个逻辑错误
(1)计算阈值
(2) 添加中断,一旦找到范围
def histogram(data, num_bins):
span = max(data) - min(data)
bin_size = float(span) / num_bins
thresholds = [0] * num_bins
for i in range(num_bins):
#I change thresholds calc
thresholds[i] = min(data) + bin_size * (i+1)
counts = [0] * num_bins
for datum in data:
for bin_index, threshold in enumerate(thresholds):
if datum <= threshold:
counts[bin_index] += 1
#I add a break
break
return counts
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
print("Correct result:\t" + str([1, 6, 4, 6]))
print("Your result:\t" + str(histogram(data, num_bins=4)))
检查阈值定义和 if 语句。
这有效:
def histogram(data, num_bins):
# Find what range the data spans, and use it to calculate the bin size.
span = max(data) - min(data)
bin_size = span / float(num_bins)
# Calculate the thresholds for each bin.
thresholds = [0 for i in range(num_bins+1)]
for i in range(num_bins):
thresholds[i] += bin_size * (i)
print thresholds
# Compute the histogram
counts = [0 for i in range(num_bins)]
for datum in data:
# Increment the count of the bin that the datum falls in
for bin_index, threshold in enumerate(thresholds):
if thresholds[bin_index-1] <= datum <= threshold:
counts[bin_index] += 1
return counts
首先,如果只是想对数据进行直方图绘制,numpy 提供了这个功能。但是,您问自己如何做到这一点。你的代码表明你忘记了你想做什么,所以把你的功能分解成更小的功能。例如,要计算阈值,请编写一个函数 thresholds(xmin, xmax, nbins)
,或者使用 numpy.linspace
更好。如果您假设相对于 0
(而不是 min(data)
)递增,这将引起您的注意,并且,如果您幸运的话,可能会提醒您不要希望精确的浮点数积累。所以你可能会得到
def thresholds(xmin, xmax, nbins):
span = (xmax - xmin) / float(nbins)
thresholds = [xmin + (i+1)*span for i in range(nbins)]
thresholds[-1] = xmax
return thresholds
接下来,您需要获取 bin 计数。同样,您可以只使用 numpy.digitize
。与您的代码相比,重要的是不要增加超过一个 bin。最后你可能会得到类似
的东西
def counts(data, bounds):
counts = [0] * len(bounds)
for datum in data:
bin = min(i for i,bound in enumerate(bounds) if bound >= datum)
counts[bin] += 1
return counts
现在您可以开始了:
def histogram02(data, num_bins):
xmin = min(data)
xmax = max(data)
th = thresholds(xmin, xmax, num_bins)
return counts(data, th)
下面的 Python 函数用于计算数据的直方图,具有相等大小的 bin。我想得到正确的结果
[1, 6, 4, 6]
但是在我运行代码之后,它得到了结果
[7, 12, 17, 17]
这是不正确的。有人知道怎么解决吗?
# Computes the histogram of a set of data
def histogram(data, num_bins):
# Find what range the data spans, and use it to calculate the bin size.
span = max(data) - min(data)
bin_size = span / num_bins
# Calculate the thresholds for each bin.
thresholds = [0] * num_bins
for i in range(num_bins):
thresholds[i] += bin_size * (i+1)
# Compute the histogram
counts = [0] * num_bins
for datum in data:
# Increment the count of the bin that the datum falls in
for bin_index, threshold in enumerate(thresholds):
if datum <= threshold:
counts[bin_index] += 1
return counts
# Some random data
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
print("Correct result:\t" + str([1, 6, 4, 6]))
print("Your result:\t" + str(histogram(data, num_bins=4)))
如果要查找直方图,请使用 numpy
import numpy as np
np.histogram([-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9],4)
只有你有两个逻辑错误
(1)计算阈值
(2) 添加中断,一旦找到范围
def histogram(data, num_bins):
span = max(data) - min(data)
bin_size = float(span) / num_bins
thresholds = [0] * num_bins
for i in range(num_bins):
#I change thresholds calc
thresholds[i] = min(data) + bin_size * (i+1)
counts = [0] * num_bins
for datum in data:
for bin_index, threshold in enumerate(thresholds):
if datum <= threshold:
counts[bin_index] += 1
#I add a break
break
return counts
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
print("Correct result:\t" + str([1, 6, 4, 6]))
print("Your result:\t" + str(histogram(data, num_bins=4)))
检查阈值定义和 if 语句。 这有效:
def histogram(data, num_bins):
# Find what range the data spans, and use it to calculate the bin size.
span = max(data) - min(data)
bin_size = span / float(num_bins)
# Calculate the thresholds for each bin.
thresholds = [0 for i in range(num_bins+1)]
for i in range(num_bins):
thresholds[i] += bin_size * (i)
print thresholds
# Compute the histogram
counts = [0 for i in range(num_bins)]
for datum in data:
# Increment the count of the bin that the datum falls in
for bin_index, threshold in enumerate(thresholds):
if thresholds[bin_index-1] <= datum <= threshold:
counts[bin_index] += 1
return counts
首先,如果只是想对数据进行直方图绘制,numpy 提供了这个功能。但是,您问自己如何做到这一点。你的代码表明你忘记了你想做什么,所以把你的功能分解成更小的功能。例如,要计算阈值,请编写一个函数 thresholds(xmin, xmax, nbins)
,或者使用 numpy.linspace
更好。如果您假设相对于 0
(而不是 min(data)
)递增,这将引起您的注意,并且,如果您幸运的话,可能会提醒您不要希望精确的浮点数积累。所以你可能会得到
def thresholds(xmin, xmax, nbins):
span = (xmax - xmin) / float(nbins)
thresholds = [xmin + (i+1)*span for i in range(nbins)]
thresholds[-1] = xmax
return thresholds
接下来,您需要获取 bin 计数。同样,您可以只使用 numpy.digitize
。与您的代码相比,重要的是不要增加超过一个 bin。最后你可能会得到类似
def counts(data, bounds):
counts = [0] * len(bounds)
for datum in data:
bin = min(i for i,bound in enumerate(bounds) if bound >= datum)
counts[bin] += 1
return counts
现在您可以开始了:
def histogram02(data, num_bins):
xmin = min(data)
xmax = max(data)
th = thresholds(xmin, xmax, num_bins)
return counts(data, th)