Pyspark:使用参数动态准备 pyspark-sql 查询
Pyspark : Dynamically prepare pyspark-sql query using parameters
动态绑定参数和准备 pyspark-sql statament 的不同方法有哪些。
示例:
动态查询
query = '''SELECT column1, column2
FROM ${db_name}.${table_name}
WHERE column1 = ${filter_value}'''
上面的动态查询有${db_name}、${table_name}和${filter_value}变量,这些变量将从运行时间参数中获取值.
参数详情:
db_name = 'your_db_name'
table_name = 'your_table_name'
filter_value = 'some_value'
动态查询中绑定参数后的预期查询
SELECT column1, column2
FROM your_db_name.your_table_name
WHERE column1 = some_value
这里有几个选项可以通过绑定参数准备 pyspark-sql。
选项#1 - 使用字符串插值/f-Strings (Python 3.6+)
db_name = 'your_db_name'
table_name = 'your_table_name'
filter_value = 'some_value'
query = f'''SELECT column1, column2
FROM {db_name}.{table_name}
WHERE column1 = {filter_value}'''
选项#2 - 使用字符串格式 (str.format)
query = '''SELECT column1, column2
FROM {}.{}
WHERE column1 = {}'''
db_name = 'your_db_name'
table_name = 'your_table_name'
filter_value = 'some_value'
query.format(db_name, table_name, filter_value)
选项#3 - 使用模板字符串
query = '''SELECT column1, column2
FROM ${db_name}.${table_name}
WHERE column1 = ${filter_value}'''
db_name = 'your_db_name'
table_name = 'your_table_name'
filter_value = 'some_value'
from string import Template
t = Template(query)
t.substitute(db_name=db_name, table_name=table_name, filter_value=filter_value)
String Interpolation/f-Strings (Option#1) 如果你有
python 3.6+ 否则使用字符串格式 str.format (Option#2)
模板字符串对于处理用户提供的字符串更有用
(选项#3)
动态绑定参数和准备 pyspark-sql statament 的不同方法有哪些。
示例:
动态查询
query = '''SELECT column1, column2
FROM ${db_name}.${table_name}
WHERE column1 = ${filter_value}'''
上面的动态查询有${db_name}、${table_name}和${filter_value}变量,这些变量将从运行时间参数中获取值.
参数详情:
db_name = 'your_db_name'
table_name = 'your_table_name'
filter_value = 'some_value'
动态查询中绑定参数后的预期查询
SELECT column1, column2
FROM your_db_name.your_table_name
WHERE column1 = some_value
这里有几个选项可以通过绑定参数准备 pyspark-sql。
选项#1 - 使用字符串插值/f-Strings (Python 3.6+)
db_name = 'your_db_name'
table_name = 'your_table_name'
filter_value = 'some_value'
query = f'''SELECT column1, column2
FROM {db_name}.{table_name}
WHERE column1 = {filter_value}'''
选项#2 - 使用字符串格式 (str.format)
query = '''SELECT column1, column2
FROM {}.{}
WHERE column1 = {}'''
db_name = 'your_db_name'
table_name = 'your_table_name'
filter_value = 'some_value'
query.format(db_name, table_name, filter_value)
选项#3 - 使用模板字符串
query = '''SELECT column1, column2
FROM ${db_name}.${table_name}
WHERE column1 = ${filter_value}'''
db_name = 'your_db_name'
table_name = 'your_table_name'
filter_value = 'some_value'
from string import Template
t = Template(query)
t.substitute(db_name=db_name, table_name=table_name, filter_value=filter_value)
String Interpolation/f-Strings (Option#1) 如果你有 python 3.6+ 否则使用字符串格式 str.format (Option#2)
模板字符串对于处理用户提供的字符串更有用 (选项#3)