在 Python 中遍历双端队列的时间复杂度是多少?
What is the time complexity of iterating through a deque in Python?
迭代的时间复杂度是多少,或者更准确地说,通过 Python 中集合库中的双端队列进行的每次迭代?
一个例子是这样的:
elements = deque([1,2,3,4])
for element in elements:
print(element)
每次迭代都是常数 O(1) 操作吗?还是在每次迭代中执行线性 O(n) 操作以获取元素?
网上有很多关于所有其他双端队列方法(如 appendleft
、append
、popleft
、pop
的时间复杂度的资源。似乎没有关于双端队列迭代的任何时间复杂度信息。
谢谢!
如果你的构造是这样的:
elements = deque([1,2,3,4])
for i in range(len(elements)):
print(elements[i])
你不是迭代deque
,你是迭代range
对象,然后 索引到 deque
。这使得迭代多项式时间,因为每个索引操作,elements[i]
是 O(n)。然而,实际上 迭代 deque
是线性时间。
for x in elements:
print(x)
这是一个快速的经验测试:
import timeit
import pandas as pd
from collections import deque
def build_deque(n):
return deque(range(n))
def iter_index(d):
for i in range(len(d)):
d[i]
def iter_it(d):
for x in d:
x
r = range(100, 10001, 100)
index_runs = [timeit.timeit('iter_index(d)', 'from __main__ import build_deque, iter_index, iter_it; d = build_deque({})'.format(n), number=1000) for n in r]
it_runs = [timeit.timeit('iter_it(d)', 'from __main__ import build_deque, iter_index, iter_it; d = build_deque({})'.format(n), number=1000) for n in r]
df = pd.DataFrame({'index':index_runs, 'iter':it_runs}, index=r)
df.plot()
以及由此产生的情节:
现在,我们可以实际看到CPython source code中的deque
个对象是如何实现迭代器协议的:
首先,deque
对象本身:
typedef struct BLOCK {
struct BLOCK *leftlink;
PyObject *data[BLOCKLEN];
struct BLOCK *rightlink;
} block;
typedef struct {
PyObject_VAR_HEAD
block *leftblock;
block *rightblock;
Py_ssize_t leftindex; /* 0 <= leftindex < BLOCKLEN */
Py_ssize_t rightindex; /* 0 <= rightindex < BLOCKLEN */
size_t state; /* incremented whenever the indices move */
Py_ssize_t maxlen;
PyObject *weakreflist;
} dequeobject;
因此,如评论中所述,deque
是“块”节点的双向链表,其中块本质上是 python 对象指针的数组。现在是迭代器协议:
typedef struct {
PyObject_HEAD
block *b;
Py_ssize_t index;
dequeobject *deque;
size_t state; /* state when the iterator is created */
Py_ssize_t counter; /* number of items remaining for iteration */
} dequeiterobject;
static PyTypeObject dequeiter_type;
static PyObject *
deque_iter(dequeobject *deque)
{
dequeiterobject *it;
it = PyObject_GC_New(dequeiterobject, &dequeiter_type);
if (it == NULL)
return NULL;
it->b = deque->leftblock;
it->index = deque->leftindex;
Py_INCREF(deque);
it->deque = deque;
it->state = deque->state;
it->counter = Py_SIZE(deque);
PyObject_GC_Track(it);
return (PyObject *)it;
}
// ...
static PyObject *
dequeiter_next(dequeiterobject *it)
{
PyObject *item;
if (it->deque->state != it->state) {
it->counter = 0;
PyErr_SetString(PyExc_RuntimeError,
"deque mutated during iteration");
return NULL;
}
if (it->counter == 0)
return NULL;
assert (!(it->b == it->deque->rightblock &&
it->index > it->deque->rightindex));
item = it->b->data[it->index];
it->index++;
it->counter--;
if (it->index == BLOCKLEN && it->counter > 0) {
CHECK_NOT_END(it->b->rightlink);
it->b = it->b->rightlink;
it->index = 0;
}
Py_INCREF(item);
return item;
}
如您所见,迭代器本质上跟踪块索引、指向块的指针以及双端队列中项目总数的计数器。如果计数器达到零,它将停止迭代,否则,它会获取当前索引处的元素,增加索引,减少计数器,并负责检查是否移动到下一个块。换句话说,双端队列可以表示为 Python 中的列表列表,例如d = [[1,2,3],[4,5,6]]
,它迭代
for block in d:
for x in block:
...
迭代的时间复杂度是多少,或者更准确地说,通过 Python 中集合库中的双端队列进行的每次迭代?
一个例子是这样的:
elements = deque([1,2,3,4])
for element in elements:
print(element)
每次迭代都是常数 O(1) 操作吗?还是在每次迭代中执行线性 O(n) 操作以获取元素?
网上有很多关于所有其他双端队列方法(如 appendleft
、append
、popleft
、pop
的时间复杂度的资源。似乎没有关于双端队列迭代的任何时间复杂度信息。
谢谢!
如果你的构造是这样的:
elements = deque([1,2,3,4])
for i in range(len(elements)):
print(elements[i])
你不是迭代deque
,你是迭代range
对象,然后 索引到 deque
。这使得迭代多项式时间,因为每个索引操作,elements[i]
是 O(n)。然而,实际上 迭代 deque
是线性时间。
for x in elements:
print(x)
这是一个快速的经验测试:
import timeit
import pandas as pd
from collections import deque
def build_deque(n):
return deque(range(n))
def iter_index(d):
for i in range(len(d)):
d[i]
def iter_it(d):
for x in d:
x
r = range(100, 10001, 100)
index_runs = [timeit.timeit('iter_index(d)', 'from __main__ import build_deque, iter_index, iter_it; d = build_deque({})'.format(n), number=1000) for n in r]
it_runs = [timeit.timeit('iter_it(d)', 'from __main__ import build_deque, iter_index, iter_it; d = build_deque({})'.format(n), number=1000) for n in r]
df = pd.DataFrame({'index':index_runs, 'iter':it_runs}, index=r)
df.plot()
以及由此产生的情节:
现在,我们可以实际看到CPython source code中的deque
个对象是如何实现迭代器协议的:
首先,deque
对象本身:
typedef struct BLOCK {
struct BLOCK *leftlink;
PyObject *data[BLOCKLEN];
struct BLOCK *rightlink;
} block;
typedef struct {
PyObject_VAR_HEAD
block *leftblock;
block *rightblock;
Py_ssize_t leftindex; /* 0 <= leftindex < BLOCKLEN */
Py_ssize_t rightindex; /* 0 <= rightindex < BLOCKLEN */
size_t state; /* incremented whenever the indices move */
Py_ssize_t maxlen;
PyObject *weakreflist;
} dequeobject;
因此,如评论中所述,deque
是“块”节点的双向链表,其中块本质上是 python 对象指针的数组。现在是迭代器协议:
typedef struct {
PyObject_HEAD
block *b;
Py_ssize_t index;
dequeobject *deque;
size_t state; /* state when the iterator is created */
Py_ssize_t counter; /* number of items remaining for iteration */
} dequeiterobject;
static PyTypeObject dequeiter_type;
static PyObject *
deque_iter(dequeobject *deque)
{
dequeiterobject *it;
it = PyObject_GC_New(dequeiterobject, &dequeiter_type);
if (it == NULL)
return NULL;
it->b = deque->leftblock;
it->index = deque->leftindex;
Py_INCREF(deque);
it->deque = deque;
it->state = deque->state;
it->counter = Py_SIZE(deque);
PyObject_GC_Track(it);
return (PyObject *)it;
}
// ...
static PyObject *
dequeiter_next(dequeiterobject *it)
{
PyObject *item;
if (it->deque->state != it->state) {
it->counter = 0;
PyErr_SetString(PyExc_RuntimeError,
"deque mutated during iteration");
return NULL;
}
if (it->counter == 0)
return NULL;
assert (!(it->b == it->deque->rightblock &&
it->index > it->deque->rightindex));
item = it->b->data[it->index];
it->index++;
it->counter--;
if (it->index == BLOCKLEN && it->counter > 0) {
CHECK_NOT_END(it->b->rightlink);
it->b = it->b->rightlink;
it->index = 0;
}
Py_INCREF(item);
return item;
}
如您所见,迭代器本质上跟踪块索引、指向块的指针以及双端队列中项目总数的计数器。如果计数器达到零,它将停止迭代,否则,它会获取当前索引处的元素,增加索引,减少计数器,并负责检查是否移动到下一个块。换句话说,双端队列可以表示为 Python 中的列表列表,例如d = [[1,2,3],[4,5,6]]
,它迭代
for block in d:
for x in block:
...