PyMySQL 在频繁读取后抛出 'BrokenPipeError'

PyMySQL throws 'BrokenPipeError' after making frequent reads

我编写了一个脚本来帮助我使用数据库。具体来说,我正在尝试处理磁盘上的文件并将这项工作的结果添加到我的数据库中。我复制了下面的代码,但删除了大部分与我的数据库无关的逻辑,以尽量使这个问题更广泛和有用。

我使用代码对文件进行操作并将结果添加到数据库,覆盖与我正在处理的文件具有相同标识符的任何文件。后来,我修改了脚本以忽略已经添加到数据库中的文档,现在每当我 运行 它时,我都会收到错误消息:

pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

服务器似乎拒绝了请求,可能是因为我的代码写得不好?我注意到错误总是发生在文件列表中的同一位置,并且不会改变。如果我重新 运行 运行 代码,将文件列表替换为仅包含程序崩溃的文件的列表,它就可以正常工作。这让我觉得在发出一定数量的请求后,数据库就触底了。

我在 OS X 上使用 Python 3 和 MySQL 社区版版本 14.14。

代码(去掉了与数据库无关的内容):

import pymysql

# Stars for user-specific stuff
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='*******',
                             db='*******',
                             use_unicode=True, 
                             charset="utf8mb4",
                             )
cursor = connection.cursor()

f_arr = # An array of all of my data objects

def convertF(file_):
    # General layout: Try to work with input and add it the result to DB. The work can raise an exception
    # If the record already exists in the DB, ignore it
    # Elif the work was already done and the result is on disk, put it on the database
    # Else do the work and put it on the database - this can raise exceptions
    # Except: Try another way to do the work, and put the result in the database. This can raise an error
    # Second (nested) except: Add the record to the database with indicator that the work failed

    # This worked before I added the initial check on whether or not the record already exists in the database. Now, for some reason, I get the error:
    # pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

    # I'm pretty sure that I have written code to work poorly with the database. I had hoped to finish this task quickly instead of efficiently.
    try:
        # Find record in DB, if text exists just ignore the record
        rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))
        length = cursor.fetchall()[0][0] # Gets the length
        if length != None and length > 4:
            pass
        elif ( "work already finished on disk" ): 
            # get "result_text" from disk
            cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
            cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
            connection.commit()
        else:
            # do work to get result_text
            cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
            cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
            connection.commit()
    except:
        try: 
            # Alternate method of work to get result_text
            cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
            cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
            connection.commit()
        except:
            # Since the job can't be done, tell the database
            cmd = "UPDATE table SET text = %s, hascontent = 0 WHERE name = %s"
            cursor.execute(cmd, ( "NO CONTENT", file_["name"]) )
            connection.commit()

for file in f_arr:
    convertF(file)

Mysql 服务器已经离开

此问题在 http://dev.mysql.com/doc/refman/5.7/en/gone-away.html 中有详细描述,通常的原因是服务器因某种原因断开连接,通常的补救措施是重试查询或重新连接并重试。

但是为什么这会破坏您的代码是因为您编写代码的方式。见下文

可能是我代码写的不好?

既然你问了。

rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))

这是个坏习惯。手动明确警告您不要这样做以避免 SQL 注入。正确的做法是

 rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = %s", (file_["name"],))

上述代码的第二个问题是您在尝试更新值之前不需要检查它是否存在。您可以删除上面的行及其关联的 if else 并直接跳转到更新。此外,我们的 elifelse 似乎做的事情完全一样。所以你的代码可以是

try:
        cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
        cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
        connection.commit()
except:  # <-- next problem.

我们来到下一个问题。永远不要捕获这样的通用异常。你应该总是捕获特定的异常,如 TypeError、AttributeError 等。当捕获通用异常是不可避免的时,你至少应该记录它。

例如,您可以在此处捕获连接错误并尝试重新连接到数据库。然后,当您的服务器消失问题发生时,代码将不会停止执行。

当我尝试通过减少我想在一个命令中插入的行数来进行批量插入时,我已经解决了同样的错误。

即使批量插入的最大行数高得多,我也有这种错误。