numpy.histogram:excel 中的空单元格存在问题
numpy.histogram: problems with empty cells in excel
我是 python 的新手,所以我不知道我是否理解了所有的技术术语。
我正在使用 xlrd 从 excel-sheet 读取数据,然后使用过滤器函数对其进行过滤,然后使用 numpy.histogram 函数创建直方图。
现在我在 excel-sheet 中有一个空单元格并且 numpy.histogram 返回错误的结果:
这是我的代码:
import xlrd
import openpyxl
import numpy as n
from numpy import *
file_location = "C:/Users/test.xlsx"
sheet_index = 2
range_hist = 23
lifetime_data = 3
low_salesyear = 1990
upp_salesyear = 2005
col_filter1 = 14
filter_value1 = 1
col_filter2 = 18
filter_value2 = 5
# open excel-file
workbook = xlrd.open_workbook(file_location)
# get sheet, index always start at 0
sheet = workbook.sheet_by_index(sheet_index)
#read all data in the sheet
list_device = [[sheet.cell_value(r,c) for c in range (sheet.ncols)] for r in range (1,sheet.nrows)]
# filter list for independent variables
listnew = list(filter(lambda x: x[col_filter1]==filter_value1 and x[col_filter2]==filter_value2 and low_salesyear <= x[0] <= upp_salesyear, list_device))
# low_salesyear <= x[0] <= upp_salesyear and
# select relevant data from filtered list for histogram and store it in list for histogram
list_for_hist = []
for i in range(len(listnew)):
list_for_hist.append(listnew[i][lifetime_data])
print (list_for_hist)
# create array from list
array_for_hist = array(list_for_hist)
# create histogram
hist = np.histogram(array_for_hist, bins = range(0,int(range_hist)))
print (hist)
我把所有的变量都放在开头,这样我就可以很容易地改变它们。
我相信会有一种更优雅的方式来对整个事情进行编程。
我从 excel 中过滤的列表如下所示:
[8.0, 19.0, 4.0, 4.0, 8.0, 3.0, 13.0, '', 10.0, 7.0, 17.0, 16.0, 8.0,
6.0, 13.0, 8.0, 7.0, 11.0, 12.0, 13.0, 4.0, 6.0, 5.0, 19.0, 8.0, 6.0]
numpy.histogram 生成的历史记录如下所示:
(array([ 0, 10, 0, 1, 3, 1, 3, 2, 5, -25, 1, 1, 1,
3, 0, 0, 1, 1, 0, 2, 0, 0]), array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22]))
所以我不明白为什么它返回 10 的 bin 1 和 -25 的 bin 9。如果我消除 excel 中的空单元格,直方图就会正确。
有没有办法告诉我的程序忽略空单元格?
非常感谢您的帮助!
np.array(list_for_hist)
将 list_for_hist
中的所有项目转换为通用数据类型。
当 list_for_hist
同时包含浮点数和字符串时, np.array
returns 包含 all 个字符串的数组:
In [32]: np.array(list_for_hist)
Out[32]:
array(['8.0', '19.0', '4.0', '4.0', '8.0', '3.0', '13.0', '', '10.0',
'7.0', '17.0', '16.0', '8.0', '6.0', '13.0', '8.0', '7.0', '11.0',
'12.0', '13.0', '4.0', '6.0', '5.0', '19.0', '8.0', '6.0'],
dtype='|S32') <-- `|S32` means 32-byte strings.
所以用 bins=range(0,int(23))
合并字符串可能应该引发异常,但是 np.histogram
returns 垃圾。
您需要将 list_for_hist
转换为仅包含浮点数的数组或列表:
import numpy as np
list_for_hist = [8.0, 19.0, 4.0, 4.0, 8.0, 3.0, 13.0, '', 10.0, 7.0, 17.0, 16.0,
8.0, 6.0, 13.0, 8.0, 7.0, 11.0, 12.0, 13.0, 4.0, 6.0, 5.0,
19.0, 8.0, 6.0]
array_for_hist = np.array(
[item if isinstance(item,(float,int)) else np.nan for item in list_for_hist])
# create histogram
hist, bin_edges = np.histogram(array_for_hist, bins=range(0,int(23)))
print (hist)
产量
[0 0 0 1 3 1 3 2 5 0 1 1 1 3 0 0 1 1 0 2 0 0]
我是 python 的新手,所以我不知道我是否理解了所有的技术术语。
我正在使用 xlrd 从 excel-sheet 读取数据,然后使用过滤器函数对其进行过滤,然后使用 numpy.histogram 函数创建直方图。 现在我在 excel-sheet 中有一个空单元格并且 numpy.histogram 返回错误的结果:
这是我的代码:
import xlrd
import openpyxl
import numpy as n
from numpy import *
file_location = "C:/Users/test.xlsx"
sheet_index = 2
range_hist = 23
lifetime_data = 3
low_salesyear = 1990
upp_salesyear = 2005
col_filter1 = 14
filter_value1 = 1
col_filter2 = 18
filter_value2 = 5
# open excel-file
workbook = xlrd.open_workbook(file_location)
# get sheet, index always start at 0
sheet = workbook.sheet_by_index(sheet_index)
#read all data in the sheet
list_device = [[sheet.cell_value(r,c) for c in range (sheet.ncols)] for r in range (1,sheet.nrows)]
# filter list for independent variables
listnew = list(filter(lambda x: x[col_filter1]==filter_value1 and x[col_filter2]==filter_value2 and low_salesyear <= x[0] <= upp_salesyear, list_device))
# low_salesyear <= x[0] <= upp_salesyear and
# select relevant data from filtered list for histogram and store it in list for histogram
list_for_hist = []
for i in range(len(listnew)):
list_for_hist.append(listnew[i][lifetime_data])
print (list_for_hist)
# create array from list
array_for_hist = array(list_for_hist)
# create histogram
hist = np.histogram(array_for_hist, bins = range(0,int(range_hist)))
print (hist)
我把所有的变量都放在开头,这样我就可以很容易地改变它们。 我相信会有一种更优雅的方式来对整个事情进行编程。
我从 excel 中过滤的列表如下所示:
[8.0, 19.0, 4.0, 4.0, 8.0, 3.0, 13.0, '', 10.0, 7.0, 17.0, 16.0, 8.0,
6.0, 13.0, 8.0, 7.0, 11.0, 12.0, 13.0, 4.0, 6.0, 5.0, 19.0, 8.0, 6.0]
numpy.histogram 生成的历史记录如下所示:
(array([ 0, 10, 0, 1, 3, 1, 3, 2, 5, -25, 1, 1, 1,
3, 0, 0, 1, 1, 0, 2, 0, 0]), array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22]))
所以我不明白为什么它返回 10 的 bin 1 和 -25 的 bin 9。如果我消除 excel 中的空单元格,直方图就会正确。
有没有办法告诉我的程序忽略空单元格?
非常感谢您的帮助!
np.array(list_for_hist)
将 list_for_hist
中的所有项目转换为通用数据类型。
当 list_for_hist
同时包含浮点数和字符串时, np.array
returns 包含 all 个字符串的数组:
In [32]: np.array(list_for_hist)
Out[32]:
array(['8.0', '19.0', '4.0', '4.0', '8.0', '3.0', '13.0', '', '10.0',
'7.0', '17.0', '16.0', '8.0', '6.0', '13.0', '8.0', '7.0', '11.0',
'12.0', '13.0', '4.0', '6.0', '5.0', '19.0', '8.0', '6.0'],
dtype='|S32') <-- `|S32` means 32-byte strings.
所以用 bins=range(0,int(23))
合并字符串可能应该引发异常,但是 np.histogram
returns 垃圾。
您需要将 list_for_hist
转换为仅包含浮点数的数组或列表:
import numpy as np
list_for_hist = [8.0, 19.0, 4.0, 4.0, 8.0, 3.0, 13.0, '', 10.0, 7.0, 17.0, 16.0,
8.0, 6.0, 13.0, 8.0, 7.0, 11.0, 12.0, 13.0, 4.0, 6.0, 5.0,
19.0, 8.0, 6.0]
array_for_hist = np.array(
[item if isinstance(item,(float,int)) else np.nan for item in list_for_hist])
# create histogram
hist, bin_edges = np.histogram(array_for_hist, bins=range(0,int(23)))
print (hist)
产量
[0 0 0 1 3 1 3 2 5 0 1 1 1 3 0 0 1 1 0 2 0 0]