当我使用多线程时,什么会减慢我的程序?

What can be slowing down my program when i use multithreading?

我正在编写一个从网站 (eve-central.com) 下载数据的程序。它 returns xml 当我发送带有一些参数的 GET 请求时。问题是我需要发出大约 7080 个这样的请求,因为我不能多次指定 typeid 参数。

def get_data_eve_central(typeids, system, hours, minq=1, thread_count=1):
    import xmltodict, urllib3
    pool = urllib3.HTTPConnectionPool('api.eve-central.com')
    for typeid in typeids:
        r = pool.request('GET', '/api/quicklook', fields={'typeid': typeid, 'usesystem': system, 'sethours': hours, 'setminQ': minq})
        answer = xmltodict.parse(r.data)

当我刚连接到网站并发出所有请求时它真的很慢所以我决定让它一次使用多个线程(我读到如果这个过程涉及很多等待(I/O, HTTP 请求), 它可以通过多线程加速很多)。我使用多线程重写了它,但不知何故它并没有更快(实际上有点慢)。这是使用多线程重写的代码:

def get_data_eve_central(all_typeids, system, hours, minq=1, thread_count=1):

    if thread_count > len(all_typeids): raise NameError('TooManyThreads')

    def requester(typeids):
        pool = urllib3.HTTPConnectionPool('api.eve-central.com')
        for typeid in typeids:
            r = pool.request('GET', '/api/quicklook', fields={'typeid': typeid, 'usesystem': system, 'sethours': hours, 'setminQ': minq})
            answer = xmltodict.parse(r.data)['evec_api']['quicklook']
            answers.append(answer)

    def chunkify(items, quantity):
        chunk_len = len(items) // quantity
        rest_count = len(items) % quantity
        chunks = []
        for i in range(quantity):
            chunk = items[:chunk_len]
            items = items[chunk_len:]
            if rest_count and items:
                chunk.append(items.pop(0))
                rest_count -= 1
            chunks.append(chunk)
        return chunks

    t = time.clock()
    threads = []
    answers = []
    for typeids in chunkify(all_typeids, thread_count):
        threads.append(threading.Thread(target=requester, args=[typeids]))
        threads[-1].start()
        threads[-1].join()

    print(time.clock()-t)
    return answers

我所做的是将所有 typeids 分成与我要使用的线程数量一样多的块,并为每个块创建一个线程来处理它。问题是:什么可以减慢它的速度? (我为我糟糕的英语道歉)

Python 有 Global Interpreter Lock. It can be your problem. Actually Python cannot do it in a genuine parallel way. You may think about switching to other languages or staying with Python but use process-based parallelism to solve your task. Here is a nice presentation Inside the Python GIL