Presto/Python:如何使用 python 连接到 AWS EMR 上的 Presto?
Presto/Python: How can I connect to Presto on AWS EMR using python?
我已经使用 AWS EMR 创建了一个 presto 集群。我正在使用所有默认配置。我想在主节点上写一个 python 脚本来将查询推送到 presto 并获得结果。
我找到了 PyHive 库,但我不知道将什么放入连接字符串中:
from pyhive import presto # or import hive
cursor = presto.connect('localhost').cursor()
statement = 'SELECT * FROM my_awesome_data LIMIT 10'
cursor.execute(statement)
my_results = cursor.fetchall()
我认为 localhost 可能是正确的,因为我是 运行presto 集群主节点上的脚本,但出现错误:
OperationalError: Unexpected status code 404
b'<!DOCTYPE html><html><head><title>Apache Tomcat/8.0.45 - Error report</title><style type="text/css">H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}.line {height: 1px; background-color: #525D76; border: none;}</style> </head><body><h1>HTTP Status 404 - /v1/statement</h1><div class="line"></div><p><b>type</b> Status report</p><p><b>message</b> <u>/v1/statement</u></p><p><b>description</b> <u>The requested resource is not available.</u></p><hr class="line"><h3>Apache Tomcat/8.0.45</h3></body></html>'
我发现 EMR 为 Presto 的网络连接器设置的端口是 8889,因此正确连接:
from pyhive import presto
cursor = presto.connect(host='localhost', port=8889).cursor()
statement = 'SELECT * FROM my_awesome_data LIMIT 10'
cursor.execute(statement)
my_results = cursor.fetchall()
我得到了我的结果:
print(my_results)
[('tn', 44599263, 925636329.2440014), ('fp', 169984085, 3624296366.570987), ('fn', 6370, 488192.751), ('tp', 47909, 4036930.0270000002)]
我已经使用 AWS EMR 创建了一个 presto 集群。我正在使用所有默认配置。我想在主节点上写一个 python 脚本来将查询推送到 presto 并获得结果。
我找到了 PyHive 库,但我不知道将什么放入连接字符串中:
from pyhive import presto # or import hive
cursor = presto.connect('localhost').cursor()
statement = 'SELECT * FROM my_awesome_data LIMIT 10'
cursor.execute(statement)
my_results = cursor.fetchall()
我认为 localhost 可能是正确的,因为我是 运行presto 集群主节点上的脚本,但出现错误:
OperationalError: Unexpected status code 404
b'<!DOCTYPE html><html><head><title>Apache Tomcat/8.0.45 - Error report</title><style type="text/css">H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}.line {height: 1px; background-color: #525D76; border: none;}</style> </head><body><h1>HTTP Status 404 - /v1/statement</h1><div class="line"></div><p><b>type</b> Status report</p><p><b>message</b> <u>/v1/statement</u></p><p><b>description</b> <u>The requested resource is not available.</u></p><hr class="line"><h3>Apache Tomcat/8.0.45</h3></body></html>'
我发现 EMR 为 Presto 的网络连接器设置的端口是 8889,因此正确连接:
from pyhive import presto
cursor = presto.connect(host='localhost', port=8889).cursor()
statement = 'SELECT * FROM my_awesome_data LIMIT 10'
cursor.execute(statement)
my_results = cursor.fetchall()
我得到了我的结果:
print(my_results)
[('tn', 44599263, 925636329.2440014), ('fp', 169984085, 3624296366.570987), ('fn', 6370, 488192.751), ('tp', 47909, 4036930.0270000002)]