Peewee 查询在多线程下运行缓慢

Peewee query runs slow with multithreading

我在使用带线程的 peewee 时发现了这个有趣的场景。

我有一个 table 看起来像这样

class Locks(BaseModel):
_id = AutoField()
name = CharField(unique=True, index=True)
last_modify_time = DateTimeField(constraints=[SQL("DEFAULT CURRENT_TIMESTAMP")])
owner = CharField()

class Meta:
    table_name = 'locks'

我想查询它包含的记录数 sql:

sql = Locks.select().where(Locks.name == 'test')
sql.execute()

很简单吧? 但是我发现它运行得非常慢。 W/o 线程化花费的时间是 3 - 5 毫秒,查询我们网络中的数据库。但是当涉及到多线程时,它会增长到 70 毫秒。

代码如下所示:

def test_lock():
    sql = Locks.select().where(Locks.name == 'test')
    sql.execute()

def run_thread():
    test = threading.Thread(target=test_lock)
    test.start()
    test.join()

yappi.set_clock_type('Wall')
yappi.start()
for _ in range(100):
    run_thread()
yappi.stop()

yappi 结果如下所示

  7    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  8       100    0.002    0.000    8.191    0.082 /home/data/EBS_Operation/services/task_center.py:337(run_thread)
  9       200    0.004    0.000    8.167    0.041 /usr/lib64/python2.7/threading.py:308(_Condition.wait)
 10       100    0.003    0.000    7.484    0.075 /usr/lib64/python2.7/threading.py:754(Thread.run)
 11       100    0.005    0.000    7.477    0.075 /home/data/EBS_Operation/services/task_center.py:330(test_lock)
 12       100    0.001    0.000    7.433    0.074 /usr/lib/python2.7/site-packages/peewee.py:1880(ModelSelect.inner)
 13       100    0.001    0.000    7.432    0.074 /usr/lib/python2.7/site-packages/peewee.py:1955(ModelSelect.execute)
 14       100    0.005    0.000    7.431    0.074 /usr/lib/python2.7/site-packages/peewee.py:2127(ModelSelect._execute)
 15       100    0.003    0.000    7.418    0.074 /usr/lib/python2.7/site-packages/peewee.py:3109(MySQLDatabase.execute)
 16       100    0.002    0.000    7.361    0.074 /usr/lib64/python2.7/threading.py:913(Thread.join)
 17       100    0.009    0.000    7.009    0.070 /usr/lib/python2.7/site-packages/peewee.py:3086(MySQLDatabase.execute_sql)
 18       100    0.008    0.000    6.653    0.067 /usr/lib/python2.7/site-packages/peewee.py:3078(MySQLDatabase.cursor)
 19       100    0.005    0.000    6.638    0.066 /usr/lib/python2.7/site-packages/peewee.py:3023(MySQLDatabase.connect)
 20       100    0.002    0.000    6.632    0.066 /usr/lib/python2.7/site-packages/peewee.py:3930(MySQLDatabase._connect)
 21       100    0.012    0.000    6.630    0.066 /usr/lib64/python2.7/site-packages/MySQLdb/__init__.py:78(Connect)
 22       100    6.415    0.064    6.618    0.066 /usr/lib64/python2.7/site-packages/MySQLdb/connections.py:62(Connection.__init__
    )
 23       100    0.002    0.000    0.817    0.008 /usr/lib64/python2.7/threading.py:728(Thread.start)
 24       100    0.001    0.000    0.810    0.008 /usr/lib64/python2.7/threading.py:604(_Event.wait)
 25  100/3400    0.031    0.000    0.392    0.000 /usr/lib/python2.7/site-packages/peewee.py:604(Context.sql)
 26       100    0.028    0.000    0.388    0.004 /usr/lib/python2.7/site-packages/peewee.py:2350(ModelSelect.__sql__)
 27       200    0.010    0.000    0.193    0.001 /usr/lib/python2.7/site-packages/peewee.py:1744(NodeList.__sql__)
 28       100    0.003    0.000    0.182    0.002 /usr/lib64/python2.7/logging/__init__.py:1127(Logger.debug)
 29       500    0.005    0.000    0.181    0.000 /usr/lib/python2.7/site-packages/peewee.py:4504(AutoField.__sql__)
 30       100    0.006    0.000    0.172    0.002 /usr/lib64/python2.7/logging/__init__.py:1249(Logger._log)
 31       100    0.001    0.000    0.164    0.002 /usr/lib/python2.7/site-packages/peewee.py:7141(ModelSelect.__sql_selection__)
 32       500    0.014    0.000    0.158    0.000 /usr/lib/python2.7/site-packages/peewee.py:1234(Column.__sql__)

知道这是怎么回事吗?

Peewee 在 thread-locals 中存储连接状态。因此,每个线程都有一个单独的连接。如果您查看分析输出,我认为具有指导意义的行是:

22       100    6.415    0.064    6.618    0.066 /usr/lib64/python2.7/site-packages/MySQLdb/connections.py:62(Connection.__init__

您要建立 100 个不同的连接,这很昂贵。

大概一旦有了它们 set-up 查询的性能就会提高一些。