Impyla 从 Flask 中插入 SQL:语法错误(标识符绑定)
Impyla Insert SQL from Flask: Syntax error (Identifier Binding)
最近我设置了一个 Flask POST 端点以通过 Impyla 模块将数据写入 Impala 数据库。
环境:Python CentOS 上的 3.6.5。
Impala版本:impalad版本2.6.0-cdh5.8.0
api.py:
from flask import Flask, request, abort, Response
from flask_cors import CORS
import json
from impala.dbapi import connect
import sys
import re
from datetime import datetime
app = application = Flask(__name__)
CORS(app)
conn = connect(host='datanode2', port=21050,
user='user', database='testdb')
@app.route("/api/endpoint", methods=['POST'])
def post_data():
# if not request.json:
# abort(400)
params = request.get_json(force=True) # getting request data
print(">>>>>> ", params, flush=True)
params['log_time'] = datetime.now().strftime("%Y-%m-%d %H-%M-%S")
# params['page_url'] = re.sub(
# '[^a-zA-Z0-9-_*.]', '', re.sub(':', '_', params['page_url']))
try:
cursor = conn.cursor()
sql = "INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES (%s, %s, %s, %s, %s, %s)"
values = (params['page_title'], params['page_url'], params['log_time'],
params['machine'], params['clicks'], params['id'])
print(">>>>>> " + sql % values, file=sys.stderr, flush=True)
cursor.execute(sql, values)
print(
f">>>>>> Data Written Successfully", file=sys.stderr, flush=True)
return Response(json.dumps({'success': True}), 201, mimetype="application/json")
except Exception as e:
print(e, file=sys.stderr, flush=True)
return Response(json.dumps({'success': False}), 400, mimetype="application/json")
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5008, debug=True)
req.py:
import requests as r
url = "http://123.234.345.456:30001/"
# url = "https://whosebug.com/questions/ask"
res = r.post('http://localhost:5008/api/endpoint',
json={
"page_title": "Home",
"page_url": url,
"machine": "Mac OS",
"clicks": 16,
"id": "60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db"
}
)
if res.ok:
print(res.json())
else:
print('Error!')
我运行烧瓶api用python api.py
然后用python req.py
测试它。
flask 服务器报错:
>>>>>> {'page_title': 'Home', 'page_url': 'http://123.234.345.456:30001/', 'machine': 'Mac OS', 'clicks': 16, 'id': '60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db'}
>>>>>> INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES (Home, http://123.234.345.456:30001/, 2018-12-12 16-14-04, Mac OS, 16, 60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db)
AnalysisException: Syntax error in line 1:
..., 'http://123.234.345.456'2018-12-12 16-14-04'0001/', ...
^
Encountered: INTEGER LITERAL
Expected: AND, AS, ASC, BETWEEN, CROSS, DESC, DIV, ELSE, END, FOLLOWING, FROM, FULL, GROUP, HAVING, ILIKE, IN, INNER, IREGEXP, IS, JOIN, LEFT, LIKE, LIMIT, NOT, NULLS, OFFSET, OR, ORDER, PRECEDING, RANGE, REGEXP, RIGHT, RLIKE, ROWS, THEN, UNION, WHEN, WHERE, COMMA, IDENTIFIER
CAUSED BY: Exception: Syntax error
这个错误有点烦人:
我尝试直接在 impala-shell 中插入 sql 命令,它起作用了。
当page_url是唯一的参数时,它也能正常工作。
所以这是某种条件字符转义问题?aping 问题?我设法通过使用一些正则表达式调整 url 来绕过这个问题(取消注释行 27 - 28)。但这真的很烦人,我不想因此而清理我的数据。
我看别人的试验,以为每次插入值加一对引号就可以了。但是,在使用字符串格式时我该如何做到这一点,而且它必须在 cursor.execute(sql, values)
之前发生?
Impyla 或其他基于 impala 的 python 库不支持参数化查询,而传统的 SQL 数据库支持这种方式。我遇到的唯一解决方案是在插入值定义为 string/timestamp.
时用引号引起来
您提到在执行查询之前使用字符串格式化时如何做到这一点?很简单,只需应用字符串格式,然后插入格式化值。
在您的示例中,我们假设您的 table 具有以下类型定义:
CREATE TABLE table (
page_title VARCHAR(64),
page_url STRING,
log_time TIMESTAMP,
machine VARCHAR(64),
clicks INT,
id CHAR(36)
)
那么您的插入语句将是:
sql = "INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES ('%s', '%s', '%s', '%s', %s, '%s')" # note the single quotes around the string/timestamp types
现在由于 log_time
是时间戳类型,您必须将 datetime.now()
格式化为 yyyy-MM-dd HH:mm:ss
格式。
params['log_time'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
如果您将 log_time
定义为 STRING 而不是 TIMESTAMP,那么您的 %Y-%m-%d %H-%M-%S
格式就可以了。
最后,执行:
values = (params['page_title'], params['page_url'], params['log_time'],
params['machine'], params['clicks'], params['id'])
cursor.execute(sql, values)
请注意,此方法仅适用于处理基本数据类型(例如数字或字符串)的情况。任何复杂的东西,例如数组或结构都不起作用。
经过一些努力,以及来自 Parameter substitution issue #317 的@Scratch'N'Purr 和@msafiullah 的大力帮助,我设法让它工作了。这有点复杂,所以我将 post 文档的完整代码:
错误原因:通过 Impyla 的冒号转义问题 API。
解决方法:使用自定义的转义函数处理数据,采用sql注入(Python的字符串格式化方式代替参数),而不是标准的PythonDBAPI 例如cursor.execute(sql, values)
.
api.py:
from flask import Flask, request, abort, Response
from flask_cors import CORS
import json
from impala.dbapi import connect
from impala.util import _escape
import sys
from datetime import datetime
import six
app = application = Flask(__name__)
CORS(app)
conn = connect(host='datanode2', port=21050,
user='user', database='testdb')
def parameterize(value): # by msafiullah
if value is None:
return "NULL"
elif isinstance(value, six.string_types):
return "'" + _escape(value) + "'"
else:
return str(value)
@app.route("/api/endpoint", methods=['POST'])
def post_data():
if not request.json:
abort(400)
params = request.get_json(force=True) # getting request data
print(">>>>>> ", params, flush=True)
params['log_time'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
try:
cursor = conn.cursor()
sql = 'INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES ( CAST({} AS VARCHAR(64)), {}, {}, CAST({} AS VARCHAR(32)) , {}, CAST({} AS VARCHAR(32)))'\
.format(parameterize(params['page_title']), parameterize(params['page_url']), parameterize(params['log_time']), parameterize(params['machine']), params['clicks'], parameterize(params['id']))
print(">>>>>> " + sql, file=sys.stderr, flush=True)
cursor.execute(sql)
print(
f">>>>>> Data Written Successfully", file=sys.stderr, flush=True)
return Response(json.dumps({'success': True}), 201, mimetype="application/json")
except Exception as e:
print(e, file=sys.stderr, flush=True)
return Response(json.dumps({'success': False}), 400, mimetype="application/json")
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5008, debug=True)
req.py 与问题相同。
table
架构:
CREATE TABLE if not exists table (
id VARCHAR(36),
machine VARCHAR(32),
clicks INT,
page_title VARCHAR(64),
page_url STRING,
log_time TIMESTAMP
);
Flask 的服务器输出:
>>>>>> {'page_title': 'Home', 'page_url': 'http://123.234.345.456:30001/', 'machine': 'Mac OS', 'clicks': 16, 'id': '60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db'}
>>>>>> INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES ( CAST('Home' AS VARCHAR(64)), 'http://123.234.345.456:30001/', '2018-12-14 17:27:29', CAST('Mac OS' AS VARCHAR(32)) , 16, CAST('60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db' AS VARCHAR(32)))
>>>>>> Data Written Successfully
127.0.0.1 - - [14/Dec/2018 17:27:29] "POST /api/endpoint HTTP/1.1" 201 -
在 Impala-shell 内,select * from table
将给出:
+----------------------------------+--------+--------------+------------+----------------------------------------------------------------------+---------------------+
| id | machine | clicks | page_title | page_url | log_time |
+----------------------------------+--------+--------------+------------+----------------------------------------------------------------------+---------------------+
| 60cd1d79-eda7-44c2-a4ec-ffdd5d6a | Mac OS | 16 | Home | http://123.234.345.456:30001/ | 2018-12-14 17:27:29 |
+----------------------------------+--------+--------------+------------+----------------------------------------------------------------------+---------------------+
基本上只有数字(例如INT
类型)不需要经过parameterize()
cleaning/escape过程。其他类型,如 VARCHAR
、CHAR
、STRING
、TIMESTAMP
(因为有冒号)应适当转义以安全地插入 Impyla API。
最近我设置了一个 Flask POST 端点以通过 Impyla 模块将数据写入 Impala 数据库。
环境:Python CentOS 上的 3.6.5。
Impala版本:impalad版本2.6.0-cdh5.8.0
api.py:
from flask import Flask, request, abort, Response
from flask_cors import CORS
import json
from impala.dbapi import connect
import sys
import re
from datetime import datetime
app = application = Flask(__name__)
CORS(app)
conn = connect(host='datanode2', port=21050,
user='user', database='testdb')
@app.route("/api/endpoint", methods=['POST'])
def post_data():
# if not request.json:
# abort(400)
params = request.get_json(force=True) # getting request data
print(">>>>>> ", params, flush=True)
params['log_time'] = datetime.now().strftime("%Y-%m-%d %H-%M-%S")
# params['page_url'] = re.sub(
# '[^a-zA-Z0-9-_*.]', '', re.sub(':', '_', params['page_url']))
try:
cursor = conn.cursor()
sql = "INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES (%s, %s, %s, %s, %s, %s)"
values = (params['page_title'], params['page_url'], params['log_time'],
params['machine'], params['clicks'], params['id'])
print(">>>>>> " + sql % values, file=sys.stderr, flush=True)
cursor.execute(sql, values)
print(
f">>>>>> Data Written Successfully", file=sys.stderr, flush=True)
return Response(json.dumps({'success': True}), 201, mimetype="application/json")
except Exception as e:
print(e, file=sys.stderr, flush=True)
return Response(json.dumps({'success': False}), 400, mimetype="application/json")
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5008, debug=True)
req.py:
import requests as r
url = "http://123.234.345.456:30001/"
# url = "https://whosebug.com/questions/ask"
res = r.post('http://localhost:5008/api/endpoint',
json={
"page_title": "Home",
"page_url": url,
"machine": "Mac OS",
"clicks": 16,
"id": "60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db"
}
)
if res.ok:
print(res.json())
else:
print('Error!')
我运行烧瓶api用python api.py
然后用python req.py
测试它。
flask 服务器报错:
>>>>>> {'page_title': 'Home', 'page_url': 'http://123.234.345.456:30001/', 'machine': 'Mac OS', 'clicks': 16, 'id': '60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db'}
>>>>>> INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES (Home, http://123.234.345.456:30001/, 2018-12-12 16-14-04, Mac OS, 16, 60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db)
AnalysisException: Syntax error in line 1:
..., 'http://123.234.345.456'2018-12-12 16-14-04'0001/', ...
^
Encountered: INTEGER LITERAL
Expected: AND, AS, ASC, BETWEEN, CROSS, DESC, DIV, ELSE, END, FOLLOWING, FROM, FULL, GROUP, HAVING, ILIKE, IN, INNER, IREGEXP, IS, JOIN, LEFT, LIKE, LIMIT, NOT, NULLS, OFFSET, OR, ORDER, PRECEDING, RANGE, REGEXP, RIGHT, RLIKE, ROWS, THEN, UNION, WHEN, WHERE, COMMA, IDENTIFIER
CAUSED BY: Exception: Syntax error
这个错误有点烦人:
我尝试直接在 impala-shell 中插入 sql 命令,它起作用了。
当page_url是唯一的参数时,它也能正常工作。
所以这是某种条件字符转义问题?aping 问题?我设法通过使用一些正则表达式调整 url 来绕过这个问题(取消注释行 27 - 28)。但这真的很烦人,我不想因此而清理我的数据。
我看别人的试验,以为每次插入值加一对引号就可以了。但是,在使用字符串格式时我该如何做到这一点,而且它必须在 cursor.execute(sql, values)
之前发生?
Impyla 或其他基于 impala 的 python 库不支持参数化查询,而传统的 SQL 数据库支持这种方式。我遇到的唯一解决方案是在插入值定义为 string/timestamp.
时用引号引起来您提到在执行查询之前使用字符串格式化时如何做到这一点?很简单,只需应用字符串格式,然后插入格式化值。
在您的示例中,我们假设您的 table 具有以下类型定义:
CREATE TABLE table (
page_title VARCHAR(64),
page_url STRING,
log_time TIMESTAMP,
machine VARCHAR(64),
clicks INT,
id CHAR(36)
)
那么您的插入语句将是:
sql = "INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES ('%s', '%s', '%s', '%s', %s, '%s')" # note the single quotes around the string/timestamp types
现在由于 log_time
是时间戳类型,您必须将 datetime.now()
格式化为 yyyy-MM-dd HH:mm:ss
格式。
params['log_time'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
如果您将 log_time
定义为 STRING 而不是 TIMESTAMP,那么您的 %Y-%m-%d %H-%M-%S
格式就可以了。
最后,执行:
values = (params['page_title'], params['page_url'], params['log_time'],
params['machine'], params['clicks'], params['id'])
cursor.execute(sql, values)
请注意,此方法仅适用于处理基本数据类型(例如数字或字符串)的情况。任何复杂的东西,例如数组或结构都不起作用。
经过一些努力,以及来自 Parameter substitution issue #317 的@Scratch'N'Purr 和@msafiullah 的大力帮助,我设法让它工作了。这有点复杂,所以我将 post 文档的完整代码:
错误原因:通过 Impyla 的冒号转义问题 API。
解决方法:使用自定义的转义函数处理数据,采用sql注入(Python的字符串格式化方式代替参数),而不是标准的PythonDBAPI 例如cursor.execute(sql, values)
.
api.py:
from flask import Flask, request, abort, Response
from flask_cors import CORS
import json
from impala.dbapi import connect
from impala.util import _escape
import sys
from datetime import datetime
import six
app = application = Flask(__name__)
CORS(app)
conn = connect(host='datanode2', port=21050,
user='user', database='testdb')
def parameterize(value): # by msafiullah
if value is None:
return "NULL"
elif isinstance(value, six.string_types):
return "'" + _escape(value) + "'"
else:
return str(value)
@app.route("/api/endpoint", methods=['POST'])
def post_data():
if not request.json:
abort(400)
params = request.get_json(force=True) # getting request data
print(">>>>>> ", params, flush=True)
params['log_time'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
try:
cursor = conn.cursor()
sql = 'INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES ( CAST({} AS VARCHAR(64)), {}, {}, CAST({} AS VARCHAR(32)) , {}, CAST({} AS VARCHAR(32)))'\
.format(parameterize(params['page_title']), parameterize(params['page_url']), parameterize(params['log_time']), parameterize(params['machine']), params['clicks'], parameterize(params['id']))
print(">>>>>> " + sql, file=sys.stderr, flush=True)
cursor.execute(sql)
print(
f">>>>>> Data Written Successfully", file=sys.stderr, flush=True)
return Response(json.dumps({'success': True}), 201, mimetype="application/json")
except Exception as e:
print(e, file=sys.stderr, flush=True)
return Response(json.dumps({'success': False}), 400, mimetype="application/json")
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5008, debug=True)
req.py 与问题相同。
table
架构:
CREATE TABLE if not exists table (
id VARCHAR(36),
machine VARCHAR(32),
clicks INT,
page_title VARCHAR(64),
page_url STRING,
log_time TIMESTAMP
);
Flask 的服务器输出:
>>>>>> {'page_title': 'Home', 'page_url': 'http://123.234.345.456:30001/', 'machine': 'Mac OS', 'clicks': 16, 'id': '60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db'}
>>>>>> INSERT INTO table ( page_title, page_url, log_time, machine, clicks, id ) VALUES ( CAST('Home' AS VARCHAR(64)), 'http://123.234.345.456:30001/', '2018-12-14 17:27:29', CAST('Mac OS' AS VARCHAR(32)) , 16, CAST('60cd1d79-eda7-44c2-a4ec-ffdd5d6ac3db' AS VARCHAR(32)))
>>>>>> Data Written Successfully
127.0.0.1 - - [14/Dec/2018 17:27:29] "POST /api/endpoint HTTP/1.1" 201 -
在 Impala-shell 内,select * from table
将给出:
+----------------------------------+--------+--------------+------------+----------------------------------------------------------------------+---------------------+
| id | machine | clicks | page_title | page_url | log_time |
+----------------------------------+--------+--------------+------------+----------------------------------------------------------------------+---------------------+
| 60cd1d79-eda7-44c2-a4ec-ffdd5d6a | Mac OS | 16 | Home | http://123.234.345.456:30001/ | 2018-12-14 17:27:29 |
+----------------------------------+--------+--------------+------------+----------------------------------------------------------------------+---------------------+
基本上只有数字(例如INT
类型)不需要经过parameterize()
cleaning/escape过程。其他类型,如 VARCHAR
、CHAR
、STRING
、TIMESTAMP
(因为有冒号)应适当转义以安全地插入 Impyla API。