使用 Pyodbc 方法优化脚本

Question

为了获取一些我需要稍后用 Matlab 处理的数据，我使用 python 脚本从一系列超过 50 个相同的数据库中提取数据（即所有共享相同的 table 结构)

我可以使用下面的代码做到这一点。但是，为了避免创建空文本文件（由于其中一些数据库根本没有相关数据），我首先进行运行查询只是为了检查它是否 returns 为空或不，然后我被迫再次运行它以获取数据本身并将其写入文件。

import thesis,pyodbc

# SQL Server settings
drvr = '{SQL Server Native Client 10.0}'
host = 'POLIVEIRA-PC\MSSQLSERVER2008'
user = 'username'
pswd = 'password'

# Establish a connection to SQL Server
cnxn = pyodbc.connect(driver=drvr, server=host, uid=user, pwd=pswd) # Setup connection

# Prepare condition
tags = thesis.sensors().keys()
condition = ' WHERE Tag_ID=' + tags[0]
for tag in tags[1:]:
    condition += ' OR Tag_ID=' + tag

# Extract data from each database
for db in thesis.db_list():
    # Prepare query
    table = '[' + db + '].dbo.tBufferAux'
    query  = 'SELECT Data, Tag_ID, Valor FROM ' + table + condition + ' ORDER BY Data ASC'
    # Check if query's output is empty
    if not cnxn.cursor().execute(query).fetchone():
        print db, 'has no records!'
        continue # If so, jump to next database
    # Otherwise, save query's output to text file
    filename = 'Dataset_' + db + '.txt'
    filepath = thesis.out_dir() + filename
    with open(filepath,'w') as file:
        for record in cnxn.cursor().execute(query):
            file.write(str(record.Data) + ' ' + str(record.Tag_ID) + ' ' + str(record.Valor) + '\n')

# Close session
cnxn.cursor().close()
cnxn.close()

虽然这段代码运行ning 很好并且在大约 20 秒内完成，但我很好奇是否有任何方法可以通过避免重复查询执行来优化此脚本，即避免调用 cnxn.cursor().execute(query)两次。

顺便说一句，我对 Python 和 SQL 都很陌生，所以如果您能在我的代码中发现错误或不被视为好的做法并告诉我，我将不胜感激我.

Answer 1

首先，我建议您查看 pymssql，它有一些 pyodbc 没有的不错的功能。

其次，我更强烈建议研究 Sql 服务器 bcp 或 SSIS。它们是为这类事情而构建的，并且比使用 python 更有效。

第三，如果所有数据库都在同一台服务器上，您实际上可以使用 master.sys.databases 在 T-SQL 中完成所有操作，并将工作推送到服务器。

考虑到这一点：

import thesis,pyodbc

# SQL Server settings
drvr = '{SQL Server Native Client 10.0}'
host = 'POLIVEIRA-PC\MSSQLSERVER2008'
user = 'username'
pswd = 'password'

# Establish a connection to SQL Server
cnxn = pyodbc.connect(driver=drvr, server=host, uid=user, pwd=pswd) # Setup     connection

# Prepare condition
tags = thesis.sensors().keys()
condition = ' WHERE Tag_ID=' + tags[0]
for tag in tags[1:]:
    condition += ' OR Tag_ID=' + tag

# Extract data from each database
for db in thesis.db_list():
    # Prepare query
    table = '[' + db + '].dbo.tBufferAux'
    query  = 'SELECT Data, Tag_ID, Valor FROM ' + table + condition + '    ORDER BY Data ASC'
    # Check if query's output is empty
    cursor = cnxn.cursor()
    cursor.execute(query)
    if cursor.rowcount == 0:
        print db, 'has no records!'
    else:
        filename = 'Dataset_' + db + '.txt'
        filepath = thesis.out_dir() + filename
        with open(filepath,'w') as file:
            while cursor.fetchone():
                file.write(str(record.Data) + ' ' + str(record.Tag_ID) + ' ' + str(record.Valor) + '\n')

# Close session
cnxn.close()

一些风格上的东西：

尽可能避免继续，使用 if-else 控制流程
Pyodbc 游标可以执行多个查询并持续存在。您不需要在每次执行查询时都创建一个新的。
游标 'remembers' 它执行的最后一个查询。
我怀疑如果您使用 space 分隔文件，您会感到难过……我是根据经验说的……:(
游标在超出范围时会自动关闭，因此 cursor.close() 如果从不需要。

使用 Pyodbc 方法优化脚本

Optimize script with Pyodbc method

python

sql-server

optimization

pyodbc