尝试查询时是否可以通过配置单元 odbc 连接发送配置单元配置变量?
Is it possible to send hive conf variables via a hive odbc connection when attempting a query?
我有一个配置单元脚本,其顶部有一些配置单元配置变量。当我 运行 它在我们的 emr 集群上时,这个查询工作正常,返回了预期的数据。例如
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions=10000;
set mapreduce.map.memory.mb=7168;
set mapreduce.reduce.memory.mb=7168;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
set hive.execution.engine=mr;
select
fruits,
count(1) as n
from table
group by fruits;
我想运行在另一台与 hive 有 odbc 连接的服务器上进行此查询。
(我在 r)
hive_conn <- DBI::dbConnect(odbc(), dsn = "Hive")
results <- DBI::dbGetQuery(hive_conn, "select fruits, count(1) as n from table group by fruits")
这 运行 很好并且 returns 数据框符合预期。
但是,如果我想设置一些配置单元,我不知道如何用odbc发送。
如何通过 odbc 告诉 hive 运行 我的查询与我选择的 hive conf 设置?
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions=10000;
set mapreduce.map.memory.mb=7168;
set mapreduce.reduce.memory.mb=7168;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
set hive.execution.engine=mr;
我在驱动程序的文档中找到了解决方案:https://www.simba.com/products/Hive/doc/ODBC_InstallGuide/linux/content/odbc/hi/configuring/serverside.htm
我需要在创建连接时添加这些 'server side properties'。您在前面加上字符串 'SSP_'(服务器端 属性),然后将它们添加为名称值对,例如:
hive_conn <- dbConnect(odbc(),
dsn = "Hive",
SSP_hive.execution.engine = "mr",
SSP_hive.exec.dynamic.partition.mode = "nonstrict",
SSP_hive.exec.dynamic.partition = "true",
SSP_hive.exec.max.dynamic.partitions = 10000,
SSP_mapreduce.map.memory.mb = 7168,
SSP_mapreduce.reduce.memory.mb = 7168,
SSP_hive.exec.max.dynamic.partitions.pernode = 10000,
SSP_hive.exec.compress.output = "true",
SSP_mapred.output.compression.codec = "org.apache.hadoop.io.compress.SnappyCodec"
)
我有一个配置单元脚本,其顶部有一些配置单元配置变量。当我 运行 它在我们的 emr 集群上时,这个查询工作正常,返回了预期的数据。例如
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions=10000;
set mapreduce.map.memory.mb=7168;
set mapreduce.reduce.memory.mb=7168;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
set hive.execution.engine=mr;
select
fruits,
count(1) as n
from table
group by fruits;
我想运行在另一台与 hive 有 odbc 连接的服务器上进行此查询。
(我在 r)
hive_conn <- DBI::dbConnect(odbc(), dsn = "Hive")
results <- DBI::dbGetQuery(hive_conn, "select fruits, count(1) as n from table group by fruits")
这 运行 很好并且 returns 数据框符合预期。
但是,如果我想设置一些配置单元,我不知道如何用odbc发送。
如何通过 odbc 告诉 hive 运行 我的查询与我选择的 hive conf 设置?
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions=10000;
set mapreduce.map.memory.mb=7168;
set mapreduce.reduce.memory.mb=7168;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
set hive.execution.engine=mr;
我在驱动程序的文档中找到了解决方案:https://www.simba.com/products/Hive/doc/ODBC_InstallGuide/linux/content/odbc/hi/configuring/serverside.htm
我需要在创建连接时添加这些 'server side properties'。您在前面加上字符串 'SSP_'(服务器端 属性),然后将它们添加为名称值对,例如:
hive_conn <- dbConnect(odbc(),
dsn = "Hive",
SSP_hive.execution.engine = "mr",
SSP_hive.exec.dynamic.partition.mode = "nonstrict",
SSP_hive.exec.dynamic.partition = "true",
SSP_hive.exec.max.dynamic.partitions = 10000,
SSP_mapreduce.map.memory.mb = 7168,
SSP_mapreduce.reduce.memory.mb = 7168,
SSP_hive.exec.max.dynamic.partitions.pernode = 10000,
SSP_hive.exec.compress.output = "true",
SSP_mapred.output.compression.codec = "org.apache.hadoop.io.compress.SnappyCodec"
)