直接索引一个numpy数组的时间复杂度是多少

Question

我假设有一个 numpy 数组，假设

>>>>nArray
array([[  23425.     ,  521331.40625],
       [  23465.     ,  521246.03125],
       [  23505.     ,  528602.8125 ],
       [  23545.     ,  531934.75   ],
       [  23585.     ,  534916.375  ],
       [  23865.     ,  527971.1875 ]])

直接索引一定非常有效。

我想 nArray[0, 1] = 69696420 必须使用散列-table 这样的东西，它的时间复杂度接近 O(1)。是吗？

更新

正如两个答案所指出的，索引 numpy 数组不涉及散列。这两个答案都清楚地解释了索引是如何发生的。

更新 2

我添加了一个简单的基准测试来证明答案的有效性

Answer 1

不涉及哈希 table。 Numpy数组就是数组，顾名思义，地址是这样计算的：

address of nArray[x, y] = base address + A * x + B * y

Answer 2

一方面

must be using a hash-table which will give a time complexity close to O(1). Is that right?

不完全正确。 Numpy arrays 基本上是 contiguous blocks of homogeneous memory，在尺寸等方面有一些额外的信息。因此，访问是 O(1)，并且只涉及一些简单的数学来确定内存中的位置。

另一方面

indexing must be pretty efficient.

不幸的是，这根本不是真的。除了边界检查（数组所做的）之外，涉及纯 python 的所有内容都非常低效（并且访问涉及纯 python 调用）。 Numpy 数组访问是 no exception。您应该尽可能使用矢量运算。

Answer 3

为了通过测试对 Ami 的回答添加一些额外的验证，我从一个仅使用直接索引进行插入的 numpy 数组创建了一个简单的循环缓冲区。基本上每次插入只是改变队列中最旧元素的值。

该代码并非完全没有错误，但它可以作为一些简单性能基准测试的基础。

import math
import numpy as np


class CircFifo():
"""
helper class, uses a numpy array to provide a circular fixed size fifo
interface.

put(element): removes the oldest element and
places a new one

get(): returns the oldest entry

empty(): returns true if fifo is empty

full():  returns true if fifo is full
"""
def __init__(self, size):
    self.array = np.empty(shape=(size, 2))
    self.size = size
    self.array[:] = np.NAN
    self.top = 0
    self.bottom = 0

def put(self, row):
    self.array[self.top, :] = row
    self.top += 1
    if self.top == self.size:
        self.top = 0

def get(self):
    if not math.isnan(self.array[self.bottom, 0]):
        row = copy.deepcopy(self.array[self.bottom, :])
        self.array[self.bottom, :] = float('NaN')
        self.bottom += 1
    if self.bottom == self.size:
        self.bottom = 0
    if math.isnan(self.array[self.bottom, 0]):
        self.bottom = 0
        self.top = 0
    return row

def empty(self):
    if math.isnan(self.array[self.bottom, 0]):
        return True
    else:
        return False

def full(self):
    if self.size - np.count_nonzero(
            np.isnan(self.array[:, 0])) == self.size:
        return True
    else:
        return False

post中答案的正确性似乎通过我运行的简单测试得到证实。我针对双端队列对象测试了插入性能。即使对于 1000 次插入双端队列，它也作为动态而非静态数据结构（与我的静态循环 fifo 相反），显然优于循环 fifo。

这是我的简单测试运行

In [5]: import time

In [6]: circFifo = CircFifo(300)

In [7]: elapsedTime = 0

In [8]: for i in range(1, 1000):
   ...:         start = time.time()
   ...:         circFifo.put(np.array([[52, 12]]))
   ...:         elapsedTime += time.time() - start
   ...:     

In [9]: elapsedTime
Out[9]: 0.010616540908813477


In [21]: queue = deque()

In [22]: elapsedTime = 0

In [23]: for i in range(1, 1000):
   ....:         start = time.time()
   ....:         queue.append(np.array([[52, 12]]))
   ....:         elapsedTime += time.time() - start
   ....:     

In [24]: elapsedTime
Out[24]: 0.00482630729675293

我知道这个基准测试没有那么有用，但很明显双端队列要快得多。对于至少该数量的插入。

我希望如果该循环 fifo 是在 C 中使用静态数组实现的，那么它的性能将不会轻易被超越。因为基本上 C 的静态数组是最简单的，并且可用的数据结构开销较少。

直接索引一个numpy数组的时间复杂度是多少

What's the time complexity of indexing a numpy array directly

python

big-o

numpy