如何在包含空列表的列表上垂直平均?

How to average vertically over a list containing a void list?

我有一个包含以下内容的列表:

    list1 = [(4.974874129422414, 0.4384932775564907, 0.1879318517703546, 5.820735609514166, 0, 0), 
    (0.15069597326856923, 0.2961961688603689, 0.21595885700786707, 5.848923022691187, 1, 0), 
    (0.15085612758502492, 0.28850876174946627, 0.18977362640233908, 5.826501216543082, 0, 0), 
    (0.15069597326856923, 0.2887489932217097, 0.2176404773200905, 5.834028536994648, 1, 0), 
    (0.15093620474325167, 0.3005203353595069, 0.18961347208652674, 5.849643723630468, 0, 0), 
    (0.15069597326856923, 0.3235825566813912, 0.21515808543054254, 5.849964035159586, 1, 0), 
    (0.15085612758502492, 0.3520099475391594, 0.18937324061280378, 5.814569613228549, 0, 0), 
    (0.15093620474325167, 0.3860427394179732, 0.2174803230046498, 5.858131979266134, 1, 0), 
    (0.1506158961103403, 0.42768286128894817, 0.18969354924443318, 5.807843071967709, 0, 0)]

我需要这些值在垂直轴上的平均值,这样输出看起来像:

    [(average_col1, average_col2, average_col3, average_col4, average_col5, average_col6)]

然而,np.mean(list1, axis=1)命令returns:

    IndexError: tuple index out of range

因此我尝试使用以下方法创建一个 numpy 数组:

    a = np.array(list1)
    a = array([ (4.974874129422414, 0.4384932775564907, 0.1879318517703546, 5.820735609514166, 0, 0),
   (0.15069597326856923, 0.2961961688603689, 0.21595885700786707, 5.848923022691187, 1, 0),
   (0.15085612758502492, 0.28850876174946627, 0.18977362640233908, 5.826501216543082, 0, 0),
   (0.15069597326856923, 0.2887489932217097, 0.2176404773200905, 5.834028536994648, 1, 0),
   (0.15093620474325167, 0.3005203353595069, 0.18961347208652674, 5.849643723630468, 0, 0),
   (0.15069597326856923, 0.3235825566813912, 0.21515808543054254, 5.849964035159586, 1, 0),
   (0.15085612758502492, 0.3520099475391594, 0.18937324061280378, 5.814569613228549, 0, 0),
   (0.15093620474325167, 0.3860427394179732, 0.2174803230046498, 5.858131979266134, 1, 0),
   (0.1506158961103403, 0.42768286128894817, 0.18969354924443318, 5.807843071967709, 0, 0)], 
  dtype=[('col1', '<f8'), ('col2', '<f8'), ('col3', '<f8'), ('col4', '<f8'), ('col5', '<i4'), ('col6', '<i4')])

如果我使用与上面相同的平均命令 returns:

    IndexError: tuple index out of range

因此我不确定从这里开始做什么。

您可以在不使用 numpy 的情况下为您的第一个列表尝试此操作:

averages = [sum(i)/float(len(i)) for i in zip(*list)]

您在尝试调用 np.mean 时使用了括号 ([]) 而不是圆括号 (())。这段代码应该做你想做的事:

import numpy as np
list1 = [(4.974874129422414, 0.4384932775564907, 0.1879318517703546, 5.820735609514166, 0, 0), 
    (0.15069597326856923, 0.2961961688603689, 0.21595885700786707, 5.848923022691187, 1, 0), 
    (0.15085612758502492, 0.28850876174946627, 0.18977362640233908, 5.826501216543082, 0, 0), 
    (0.15069597326856923, 0.2887489932217097, 0.2176404773200905, 5.834028536994648, 1, 0), 
    (0.15093620474325167, 0.3005203353595069, 0.18961347208652674, 5.849643723630468, 0, 0), 
    (0.15069597326856923, 0.3235825566813912, 0.21515808543054254, 5.849964035159586, 1, 0), 
    (0.15085612758502492, 0.3520099475391594, 0.18937324061280378, 5.814569613228549, 0, 0), 
    (0.15093620474325167, 0.3860427394179732, 0.2174803230046498, 5.858131979266134, 1, 0), 
    (0.1506158961103403, 0.42768286128894817, 0.18969354924443318, 5.807843071967709, 0, 0)]

means = np.mean(np.array(list1),axis = 1)

print(means)

结果:

[ 1.90367248  1.25196234  1.07593996  1.248519    1.08178562  1.25656678
  1.08446815  1.26876521  1.09597256]

编辑:

如果你想对列进行平均,它是

means = np.mean(np.array(list1),axis = 0)

给出:

[ 0.68679585  0.34464285  0.20140261  5.83448231  0.44444444  0.        ]

这应该有效

list1 = np.array(list1)
mean_col = list1[:,col_index].mean()

Column_index代表要计算均值的索引列,即column1的index = 0 ,column2=1.

我自己试过,有效:)

您在使用 numpy 时遇到的问题是示例中矩阵的声明。

给定:

list1 = [(4.974874129422414, 0.4384932775564907, 0.1879318517703546, 5.820735609514166, 0, 0), 
    (0.15069597326856923, 0.2961961688603689, 0.21595885700786707, 5.848923022691187, 1, 0), 
    (0.15085612758502492, 0.28850876174946627, 0.18977362640233908, 5.826501216543082, 0, 0), 
    (0.15069597326856923, 0.2887489932217097, 0.2176404773200905, 5.834028536994648, 1, 0), 
    (0.15093620474325167, 0.3005203353595069, 0.18961347208652674, 5.849643723630468, 0, 0), 
    (0.15069597326856923, 0.3235825566813912, 0.21515808543054254, 5.849964035159586, 1, 0), 
    (0.15085612758502492, 0.3520099475391594, 0.18937324061280378, 5.814569613228549, 0, 0), 
    (0.15093620474325167, 0.3860427394179732, 0.2174803230046498, 5.858131979266134, 1, 0), 
    (0.1506158961103403, 0.42768286128894817, 0.18969354924443318, 5.807843071967709, 0, 0)]

您可以轻松地使用它在 numpy 中按列获取平均值:

>>> np.mean(list1, axis=0)
[ 0.68679585  0.34464285  0.20140261  5.83448231  0.44444444  0.        ]

你接下来有一个有趣的声明:

a = np.array([ (4.974874129422414, 0.4384932775564907, 0.1879318517703546, 5.820735609514166, 0, 0),
    (0.15069597326856923, 0.2961961688603689, 0.21595885700786707, 5.848923022691187, 1, 0),
    (0.15085612758502492, 0.28850876174946627, 0.18977362640233908, 5.826501216543082, 0, 0),
    (0.15069597326856923, 0.2887489932217097, 0.2176404773200905, 5.834028536994648, 1, 0),
    (0.15093620474325167, 0.3005203353595069, 0.18961347208652674, 5.849643723630468, 0, 0),
    (0.15069597326856923, 0.3235825566813912, 0.21515808543054254, 5.849964035159586, 1, 0),
    (0.15085612758502492, 0.3520099475391594, 0.18937324061280378, 5.814569613228549, 0, 0),
    (0.15093620474325167, 0.3860427394179732, 0.2174803230046498, 5.858131979266134, 1, 0),
    (0.1506158961103403, 0.42768286128894817, 0.18969354924443318, 5.807843071967709, 0, 0)], 
  dtype=[('col1', '<f8'), ('col2', '<f8'), ('col3', '<f8'), ('col4', '<f8'), ('col5', '<i4'), ('col6', '<i4')])

这与 matrix=np.array(list1) 不同它所做的是声明一个 numpy structured array 并命名每一列并为该列提供 dtype

该数组的每一行元素都是一个元组:

 >>> a[0]
 ( 4.97487413,  0.43849328,  0.18793185,  5.82073561, 0, 0)

并且您不能以通常的方式访问列:

>>> a[:,0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array

由于实际上是一维数组:

>>> a.shape
(9,)

相反,您必须按名称访问列:

>>> a['col1']
array([ 4.97487413,  0.15069597,  0.15085613,  0.15069597,  0.1509362 ,
        0.15069597,  0.15085613,  0.1509362 ,  0.1506159 ])

或者,按列名取平均值:

>>> [np.mean(a[col]) for col in ['col{}'.format(i) for i in range(1,7)]]
[0.68679584555500162, 0.34464284907500159, 0.20140260920884526, 5.8344823121106151, 0.44444444444444442, 0.0]