如何在我的cloudera集群中找到KUDU DB的KUDU主名称或端口?

How to find KUDU master name or port in which KUDU DB in my cloudera cluster?

我正在尝试将 Spark 数据帧写入 Kudu DB,但我不知道 Kudu master。我使用的集群是Cloudera集群。

如何在集群中找到Kudu master?

这是一个使用 Cloudera Manager Java 客户端 (https://cloudera.github.io/cm_api/docs/java-client-swagger/)

的非常基本的示例
package cloudera.kudu_example;

import java.io.IOException;

import com.cloudera.api.swagger.HostsResourceApi;
import com.cloudera.api.swagger.ServicesResourceApi;
import com.cloudera.api.swagger.client.ApiClient;
import com.cloudera.api.swagger.client.ApiException;
import com.cloudera.api.swagger.client.Configuration;
import com.cloudera.api.swagger.model.ApiHost;
import com.cloudera.api.swagger.model.ApiRole;
import com.cloudera.api.swagger.model.ApiRoleList;

public class App {
    public static void main( String[] args ) throws IOException {
        ApiClient cmClient = Configuration.getDefaultApiClient();

        cmClient.setBasePath(args[0]);
        cmClient.setUsername(args[1]);
        cmClient.setPassword(args[2]);

        cmClient.setVerifyingSsl(false);
        HostsResourceApi hostsApiInstance = new HostsResourceApi();
        ServicesResourceApi servicesApiInstance = new ServicesResourceApi();
        try {
            ApiRoleList apiRoles = servicesApiInstance.readRoles("Cluster 1", "KUDU-1");
            for(ApiRole role : apiRoles.getItems()) {
                if(role.getType().equalsIgnoreCase("KUDU_MASTER")) {
                    ApiHost host = hostsApiInstance.readHost(role.getHostRef().getHostId(), "full");
                    System.out.printf("Kudu master runs at %s. IP: %s, status %s", host.getHostname(), host.getIpAddress(), host.getEntityStatus());
                }
            }

        } catch (ApiException e) {
          System.err.println("Exception when calling ClustersResourceApi#readClusters");
          e.printStackTrace();
        }
    }
}

这是使用 Python 客户端 v3 (https://cloudera.github.io/cm_api/docs/python-client-swagger/) 的 python 示例:

#!/usr/local/bin/python
import cm_client

# Configure HTTP basic authorization: basic
#configuration = cm_client.Configuration()
cm_client.configuration.username = 'admin'
cm_client.configuration.password = 'admin'

# Create an instance of the API class
api_client = cm_client.ApiClient("http://your-cdh-cluster-cm-host:7180/api/v30")

# create an instance of the ServicesResourceApi class
service_api_instance = cm_client.ServicesResourceApi(api_client)

# create an instance of the HostsResourceApi class
host_api_instance = cm_client.HostsResourceApi(api_client)

# find KUDU_MASTER roles in the CDH cluster
cluster_roles = service_api_instance.read_roles("Cluster 1", "KUDU-1")
for role in cluster_roles.items:
  if role.type == "KUDU_MASTER":
    role_host = host_api_instance.read_host(role.host_ref.host_id, view="full")
    print("Kudu master is located on %s\n" % role_host.hostname)

我知道这不是最好的方法,但这是一种快速的方法。让我们假设我们已经有了一个 kudu table(以防万一,即使你没有通过 impala 创建一个 test/temporary table),只需做一个描述格式table。您将获得大量详细信息,包括 kudu 主机详细信息(主机名),其中端口为 8051。我相信一旦您知道了主机和端口详细信息,您就可以探索很多 用于您的 spark 数据框。

温度 table 语法:

创建 TABLE kudu_no_partition_by_clause ( id bigint 主键,s 字符串,b 布尔值 ) 存储为 KUDU;

描述的语法:描述格式 table_name;

财政年度:

Kudu Web 管理详细信息:https://kudu.apache.org/releases/0.6.0/docs/administration.html

带有 spark 示例的 Kudu: https://kudu.apache.org/docs/developing.html

干杯!!