Python、MySQL 回归、SQL 错误或错误的条件?
Python, MySQL regression, SQL bug or faulty conditionals?
我有一个存储桶文件夹,其中包含 yy-mm-dd.CSV 格式的 csv 文件,其中有几行 header 除了第二行末尾的日期,我可以忽略,然后151 行 timestamp:power(kW)。这是一个片段:
sep=;
Version CSV|Tool SunnyBeam11|Linebreaks CR/LF|Delimiter semicolon|Decimalpoint point|Precision 3|Language en-UK|TZO=0|DST|2012.06.21
;SN: removed
;SB removed
;2120138796
Time;Power
HH:mm;kW
00:10;0.000
00:20;0.000
00:30;0.000
00:40;0.000
00:50;0.000
01:00;0.000
01:10;0.000
01:20;0.000
01:30;0.000
01:40;0.000
01:50;0.000
02:00;0.000
02:10;0.000
02:20;0.000
02:30;0.000
02:40;0.000
02:50;0.000
03:00;0.000
03:10;0.000
03:20;0.000
03:30;0.000
03:40;0.000
03:50;0.000
04:00;0.000
04:10;0.000
04:20;0.000
04:30;0.000
04:40;0.000
04:50;0.006
05:00;0.024
05:10;0.006
05:20;0.000
05:30;0.030
05:40;0.036
05:50;0.042
06:00;0.042
06:10;0.042
06:20;0.048
06:30;0.060
06:40;0.114
06:50;0.132
07:00;0.150
我解析了这些文件的存储桶文件夹,检查它们是否具有这种格式的文件名,因为还有其他我不想解析的文件,我从每个文件的第二行获取日期并存储它。我连接到数据库,然后处理剩余的行,将存储的日期与第 9 行(或附近)之后每行的时间戳连接起来。我还获取了每行的第二个值(功率,以 kW 为单位)。目的是将连接的 date-time 值和关联的功率值插入到连接的 mysql 数据库中。读取最后一行时,文件将移动到名为 'parsed' 的子文件夹中。所有这些都按预期进行,但读取的每一行都经过打印 'cannot append to Db' 的 try/except 循环(第 107 行)的 except 分支。我已经通过登录 MySQL(实际上是 OpenSuse LEAP 4.2 上的 MariaDB)检查了存储的数据库凭据是否正常工作,并且打印了连接变量,这两者都让我相信我实际上已经连接了正确地为每个文件。我会删掉部分 Python 脚本以使其更短,但我不是特别高级的 Python 编码员,我不想冒错过关键部分的风险:
#!/usr/bin/python
from os import listdir
from datetime import datetime
import MySQLdb
import shutil
import syslog
#from sys import argv
def is_dated_csv(filename):
"""
Return True if filename matches format YY-MM-DD.csv, otherwise False.
"""
date_format = '%y-%m-%d.csv'
try:
date = datetime.strptime(filename, date_format)
return True
except ValueError:
# filename did not match pattern
syslog.syslog('SunnyData file ' + filename + ' did NOT match')
#print filename + ' did NOT match'
pass
#'return' terminates a function
return False
def parse_for_date(filename):
"""
Read file for the date - from line 2 field 10
"""
currentFile = open(filename,'r')
l1 = currentFile.readline() #ignore first line read
date_line = currentFile.readline() #read second line
dateLineArray = date_line.split("|")
day_in_question = dateLineArray[-1]#save the last element (date)
currentFile.close()
return day_in_question
def normalise_date_to_UTF(day_in_question):
"""
Rather wierdly, some days use YYYY.MM.DD format & others use DD/MM/YYYY
This function normalises either to UTC with a blank time (midnight)
"""
if '.' in day_in_question: #it's YYYY.MM.DD
dateArray = day_in_question.split(".")
dt = (dateArray[0] +dateArray[1] + dateArray[2].rstrip() + '000000')
elif '/' in day_in_question: #it's DD/MM/YYYY
dateArray = day_in_question.split("/")
dt = (dateArray[2].rstrip() + dateArray[1] + dateArray[0] + '000000')
theDate = datetime.strptime(dt,'%Y%m%d%H%M%S')
return theDate #A datetime object
def parse_power_values(filename, theDate):
currentFile = open(filename,'r')
for i, line in enumerate(currentFile):
if i <= 7:
doingSomething = True
print 'header' + str(i) + '/ ' + line.rstrip()
elif ((i > 7) and (i <= 151)):
lineParts = line.split(';')
theTime = lineParts[0].split(':')
theHour = theTime[0]
theMin = theTime[1]
timestamp = theDate.replace(hour=int(theHour),minute=int(theMin))
power = lineParts[1].rstrip()
if power == '-.---':
power = 0.000
if (float(power) > 0):
print str(i) + '/ ' + str(timestamp) + ' power = ' + power + 'kWh'
append_to_database(timestamp,power)
else:
print str(i) + '/ '
elif i > 151:
print str(timestamp) + ' DONE!'
print '----------------------'
break
currentFile.close()
def append_to_database(timestampval,powerval):
host="localhost", # host
user="removed", # username
#passwd="******"
passwd="removed"
database_name = 'SunnyData'
table_name = 'DTP'
timestamp_column = 'DT'
power_column = 'PWR'
#sqlInsert = ("INSERT INTO %s (%s,%s) VALUES('%s','%s')" % (table_name, timestamp_column, power_column, timestampval.strftime('%Y-%m-%d %H:%M:%S'), powerval) )
#sqlCheck = ("SELECT TOP 1 %s.%s FROM %s WHERE %s.%s = %s;" % (table_name, timestamp_column, table_name, table_name, timestamp_column, timestampval.strftime('%Y-%m-%d %H:%M:%S')) )
sqlInsert = ("INSERT INTO %s (%s,%s) VALUES('%s','%s')", (table_name, timestamp_column, power_column, timestampval.strftime('%Y-%m-%d %H:%M:%S'), powerval) )
sqlCheck = ("SELECT TOP 1 %s.%s FROM %s WHERE %s.%s = %s;", (table_name, timestamp_column, table_name, table_name, timestamp_column, timestampval.strftime('%Y-%m-%d %H:%M:%S')) )
cur = SD.cursor()
try:
#cur.execute(sqlCheck)
# Aim here is to see if the datetime for the file has an existing entry in the database_name
#If it does, do nothing, otherwise add the values to the datbase
cur.execute(sqlCheck)
if cur.fetchone() == "None":
cur.execute(sqlInsert)
print ""
SD.commit()
except:
print 'DB append failed!'
syslog.syslog('SunnyData DB append failed')
SD.rollback()
# Main start of program
path = '/home/greg/currentGenerated/SBEAM/'
destination = path + '/parsed'
syslog.syslog('parsing SunnyData CSVs started')
for filename in listdir(path):
print filename
if is_dated_csv(filename):
#connect and disconnect once per CSV file - wasteful to reconnect for every line in def append_to_database(...)
SD = MySQLdb.connect(host="localhost", user="root",passwd="removed", db = 'SunnyData')
print SD
print filename + ' matched'
day_in_question = parse_for_date(filename)
print 'the date is ' + day_in_question
theDate = normalise_date_to_UTF(day_in_question)
parse_power_values(filename, theDate)
SD.close()
shutil.move(path + '/' + filename, destination)
syslog.syslog('SunnyData file' + path + '/' + filename + 'parsed & moved to ' + destination)
它曾经有效,但自从我上次检查以来已经有很长时间了,而且更新很多。我担心回归可能会改变我的代码中的某些内容。只是不确定如何解决所有问题。
抱歉,这不是一个非常明确和具体的问题,但如果你能帮我整理一下,它仍然可以作为其他人的一个很好的例子吗?
谢谢
格雷格
MySQL/MariaDB 中没有 SELECT TOP ...
语法,因此您的脚本一定是在尝试执行 sqlCheck
时失败了。
应该是SELECT %s.%s FROM %s WHERE %s.%s = %s LIMIT 1
。
我有一个存储桶文件夹,其中包含 yy-mm-dd.CSV 格式的 csv 文件,其中有几行 header 除了第二行末尾的日期,我可以忽略,然后151 行 timestamp:power(kW)。这是一个片段:
sep=;
Version CSV|Tool SunnyBeam11|Linebreaks CR/LF|Delimiter semicolon|Decimalpoint point|Precision 3|Language en-UK|TZO=0|DST|2012.06.21
;SN: removed
;SB removed
;2120138796
Time;Power
HH:mm;kW
00:10;0.000
00:20;0.000
00:30;0.000
00:40;0.000
00:50;0.000
01:00;0.000
01:10;0.000
01:20;0.000
01:30;0.000
01:40;0.000
01:50;0.000
02:00;0.000
02:10;0.000
02:20;0.000
02:30;0.000
02:40;0.000
02:50;0.000
03:00;0.000
03:10;0.000
03:20;0.000
03:30;0.000
03:40;0.000
03:50;0.000
04:00;0.000
04:10;0.000
04:20;0.000
04:30;0.000
04:40;0.000
04:50;0.006
05:00;0.024
05:10;0.006
05:20;0.000
05:30;0.030
05:40;0.036
05:50;0.042
06:00;0.042
06:10;0.042
06:20;0.048
06:30;0.060
06:40;0.114
06:50;0.132
07:00;0.150
我解析了这些文件的存储桶文件夹,检查它们是否具有这种格式的文件名,因为还有其他我不想解析的文件,我从每个文件的第二行获取日期并存储它。我连接到数据库,然后处理剩余的行,将存储的日期与第 9 行(或附近)之后每行的时间戳连接起来。我还获取了每行的第二个值(功率,以 kW 为单位)。目的是将连接的 date-time 值和关联的功率值插入到连接的 mysql 数据库中。读取最后一行时,文件将移动到名为 'parsed' 的子文件夹中。所有这些都按预期进行,但读取的每一行都经过打印 'cannot append to Db' 的 try/except 循环(第 107 行)的 except 分支。我已经通过登录 MySQL(实际上是 OpenSuse LEAP 4.2 上的 MariaDB)检查了存储的数据库凭据是否正常工作,并且打印了连接变量,这两者都让我相信我实际上已经连接了正确地为每个文件。我会删掉部分 Python 脚本以使其更短,但我不是特别高级的 Python 编码员,我不想冒错过关键部分的风险:
#!/usr/bin/python
from os import listdir
from datetime import datetime
import MySQLdb
import shutil
import syslog
#from sys import argv
def is_dated_csv(filename):
"""
Return True if filename matches format YY-MM-DD.csv, otherwise False.
"""
date_format = '%y-%m-%d.csv'
try:
date = datetime.strptime(filename, date_format)
return True
except ValueError:
# filename did not match pattern
syslog.syslog('SunnyData file ' + filename + ' did NOT match')
#print filename + ' did NOT match'
pass
#'return' terminates a function
return False
def parse_for_date(filename):
"""
Read file for the date - from line 2 field 10
"""
currentFile = open(filename,'r')
l1 = currentFile.readline() #ignore first line read
date_line = currentFile.readline() #read second line
dateLineArray = date_line.split("|")
day_in_question = dateLineArray[-1]#save the last element (date)
currentFile.close()
return day_in_question
def normalise_date_to_UTF(day_in_question):
"""
Rather wierdly, some days use YYYY.MM.DD format & others use DD/MM/YYYY
This function normalises either to UTC with a blank time (midnight)
"""
if '.' in day_in_question: #it's YYYY.MM.DD
dateArray = day_in_question.split(".")
dt = (dateArray[0] +dateArray[1] + dateArray[2].rstrip() + '000000')
elif '/' in day_in_question: #it's DD/MM/YYYY
dateArray = day_in_question.split("/")
dt = (dateArray[2].rstrip() + dateArray[1] + dateArray[0] + '000000')
theDate = datetime.strptime(dt,'%Y%m%d%H%M%S')
return theDate #A datetime object
def parse_power_values(filename, theDate):
currentFile = open(filename,'r')
for i, line in enumerate(currentFile):
if i <= 7:
doingSomething = True
print 'header' + str(i) + '/ ' + line.rstrip()
elif ((i > 7) and (i <= 151)):
lineParts = line.split(';')
theTime = lineParts[0].split(':')
theHour = theTime[0]
theMin = theTime[1]
timestamp = theDate.replace(hour=int(theHour),minute=int(theMin))
power = lineParts[1].rstrip()
if power == '-.---':
power = 0.000
if (float(power) > 0):
print str(i) + '/ ' + str(timestamp) + ' power = ' + power + 'kWh'
append_to_database(timestamp,power)
else:
print str(i) + '/ '
elif i > 151:
print str(timestamp) + ' DONE!'
print '----------------------'
break
currentFile.close()
def append_to_database(timestampval,powerval):
host="localhost", # host
user="removed", # username
#passwd="******"
passwd="removed"
database_name = 'SunnyData'
table_name = 'DTP'
timestamp_column = 'DT'
power_column = 'PWR'
#sqlInsert = ("INSERT INTO %s (%s,%s) VALUES('%s','%s')" % (table_name, timestamp_column, power_column, timestampval.strftime('%Y-%m-%d %H:%M:%S'), powerval) )
#sqlCheck = ("SELECT TOP 1 %s.%s FROM %s WHERE %s.%s = %s;" % (table_name, timestamp_column, table_name, table_name, timestamp_column, timestampval.strftime('%Y-%m-%d %H:%M:%S')) )
sqlInsert = ("INSERT INTO %s (%s,%s) VALUES('%s','%s')", (table_name, timestamp_column, power_column, timestampval.strftime('%Y-%m-%d %H:%M:%S'), powerval) )
sqlCheck = ("SELECT TOP 1 %s.%s FROM %s WHERE %s.%s = %s;", (table_name, timestamp_column, table_name, table_name, timestamp_column, timestampval.strftime('%Y-%m-%d %H:%M:%S')) )
cur = SD.cursor()
try:
#cur.execute(sqlCheck)
# Aim here is to see if the datetime for the file has an existing entry in the database_name
#If it does, do nothing, otherwise add the values to the datbase
cur.execute(sqlCheck)
if cur.fetchone() == "None":
cur.execute(sqlInsert)
print ""
SD.commit()
except:
print 'DB append failed!'
syslog.syslog('SunnyData DB append failed')
SD.rollback()
# Main start of program
path = '/home/greg/currentGenerated/SBEAM/'
destination = path + '/parsed'
syslog.syslog('parsing SunnyData CSVs started')
for filename in listdir(path):
print filename
if is_dated_csv(filename):
#connect and disconnect once per CSV file - wasteful to reconnect for every line in def append_to_database(...)
SD = MySQLdb.connect(host="localhost", user="root",passwd="removed", db = 'SunnyData')
print SD
print filename + ' matched'
day_in_question = parse_for_date(filename)
print 'the date is ' + day_in_question
theDate = normalise_date_to_UTF(day_in_question)
parse_power_values(filename, theDate)
SD.close()
shutil.move(path + '/' + filename, destination)
syslog.syslog('SunnyData file' + path + '/' + filename + 'parsed & moved to ' + destination)
它曾经有效,但自从我上次检查以来已经有很长时间了,而且更新很多。我担心回归可能会改变我的代码中的某些内容。只是不确定如何解决所有问题。
抱歉,这不是一个非常明确和具体的问题,但如果你能帮我整理一下,它仍然可以作为其他人的一个很好的例子吗?
谢谢
格雷格
MySQL/MariaDB 中没有 SELECT TOP ...
语法,因此您的脚本一定是在尝试执行 sqlCheck
时失败了。
应该是SELECT %s.%s FROM %s WHERE %s.%s = %s LIMIT 1
。