使用 pyodbc 与多处理并行更新 MSSQL table 时出现太多死锁错误

Getting too many deadlock errors while updating MSSQL table with pyodbc in parallel with multiprocessing

我正在尝试打开其中包含数据的 pickle 文件,然后使用该数据更新 MSSQL table。更新 1,000,000 行需要花费 10 天的时间。所以我写了一个脚本来提高并行性。我 运行 它的进程越多,我得到的错误就越多

(<class 'pyodbc.Error'>, Error('40001', '[40001] [Microsoft][ODBC SQL Server Dri
ver][SQL Server]Transaction (Process ID 93) was deadlocked on lock resources wit
h another process and has been chosen as the deadlock victim. Rerun the transact
ion. (1205) (SQLExecDirectW)'), <traceback object at 0x0000000002791808>)  

正如您在我的代码中看到的那样,我一直在尝试处理更新直到成功,甚至在这里休眠了一秒钟

while True:
    try:
        updated = cursor.execute(update,'Yes', fileName+'.'+ext, dt, size,uniqueID )
        break
    except:
        time.sleep(1)
        print sys.exc_info() 

这是因为当您在 windows 中使用多处理模块时,它使用 os.spawn 而不是 os.fork 吗?

有没有一种方法可以提高速度?

有人告诉我 table 可以处理比这更多的交易...

#!C:/Python/python.exe -u

import pyodbc,re,pickle,os,glob,sys,time
from multiprocessing import Lock, Process, Queue, current_process


def UpDater(pickleQueue):

   for pi in iter(pickleQueue.get, 'STOP'):
        name = current_process().name
        f=pi

        cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=database.windows.net;DATABASE=DB;UID=user;PWD=pwd');
        cursor = cnxn.cursor()
        update = ("""UPDATE DocumentList
                SET Downloaded=?, DownLoadedAs=?,DownLoadedWhen=?,DownLoadedSizeKB=?
                WHERE DocNumberSequence=?""")

        r = re.compile('\d+')

        pkl_file = open(pi, 'rb')
        meta = pickle.load(pkl_file)
        fileName = meta[0][0]
        pl = r.findall(fileName)
        l= int(len(pl)-1)
        ext = meta[0][1]
        url = meta[0][2]
        uniqueID = pl[l]
        dt = meta[0][4]
        size = meta[0][5]

        while True:
            try:
                updated = cursor.execute(update,'Yes', fileName+'.'+ext, dt, size,uniqueID )
                break
            except:
                time.sleep(1)
                print sys.exc_info() 

        print uniqueID  

        cnxn.commit()
        pkl_file.close()
        os.remove(fileName+'.pkl')
        cnxn.close()

if __name__ == '__main__':

    os.chdir('Pickles')
    pickles = glob.glob("*.pkl")
    pickleQueue=Queue();processes =[];

    for item in pickles:
        pickleQueue.put(item)


    workers = int(sys.argv[1]);
    for x in xrange(workers):
            p = Process(target=UpDater,args=(pickleQueue,))
            p.start()
            processes.append(p)
            pickleQueue.put('STOP')

    for p in processes:
        p.join()

我正在使用 Windows 7 和 python 2.7 Anaconda Distribution

编辑 下面使用行锁的答案阻止了错误的发生。但是,更新仍然很慢。原来需要在主键上使用老式索引才能提高 100 倍的速度

一些尝试。使用睡眠是一个坏主意。首先,您可以尝试行级锁定吗?

    update = ("""UPDATE DocumentList WITH (ROWLOCK)
            SET Downloaded=?, DownLoadedAs=?,DownLoadedWhen=?,DownLoadedSizeKB=?
            WHERE DocNumberSequence=? """)

另一种选择是将每个包装在一个事务中:

    update = ("""
        BEGIN TRANSACTION my_trans;
            UPDATE DocumentList
            SET Downloaded=?, DownLoadedAs=?,DownLoadedWhen=?,DownLoadedSizeKB=?
            WHERE DocNumberSequence=?;
        END TRANSACTION my_trans;
    """)

这些解决方案是否适合您?