使用 cli 在 GCP CloudShell 上部署 kubeflow:/home/user/.kube/config: 没有这样的文件或目录

Deploy kubeflow on GCP CloudShell using cli: /home/user/.kube/config: no such file or directory

我需要在 GCP 上部署 Kubeflow for ML 管道和 TFX。不幸的是,我无法使用以下 UI 进行安装,因为我需要手动设置区域、网络和子网。

有一个不错的文档页面: https://www.kubeflow.org/docs/gke/deploy/deploy-cli/

我尝试使用 OAuth 凭据和基本身份验证。我还尝试安装 kfctl_v0.5.1_linux.tar.gzkfctl_v0.5.0_linux.tar.gz。我总是得到以下信息:

WARN[0036] could not open /home/user/.kube/config Error stat /home/user/.kube/config: no such file or directory  filename=“apps/group.go:188”
WARN[0036] could not load config Error: open /home/user/.kube/config: no such file or directory  filename=“apps/group.go:208"

我安装了 kubectl:

kubectl version
Client Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.8-dispatcher", GitCommit:"1215389331387f57594b42c5dd024a2fe27334f8", GitTreeState:"clean", BuildDate:"2019-05-1
3T18:09:56Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?

显然 ~/.kube/config 不存在。哪个应用程序应该创建它?

ls ~/.kube/config
ls: cannot access '/home/user/.kube/config': No such file or directory

我做了以下事情:

ls home/user/folder/kubeflow
kfctl  kfctl_v0.5.1_linux.tar.gz


export KUBEFLOW_USERNAME=xxx
export KUBEFLOW_PASSWORD=xxx
export PATH=$PATH:/home/user/folder/kubeflow
export ZONE=europe-west1-b
export PROJECT=project
export KFAPP=kubeflow-test

来自 home/user/folder/kubeflow:

kfctl init ${KFAPP} --platform gcp --project ${PROJECT} --use_basic_auth -V

INFO[0014] Not skipping GCP project init, running gcpInitProject.  filename="gcp/gcp.go:1619"
WARN[0017] batch API enabling is running: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com] (op = operations/acf.954cc3b6-f1f4-46a2-832d-596ccb5a3d5a)  filename="gcp/gcp.go:1594"
WARN[0018] batch API enabling is running: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com] (op = operations/acf.954cc3b6-f1f4-46a2-832d-596ccb5a3d5a)  filename="gcp/gcp.go:1594"
WARN[0019] batch API enabling is running: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com] (op = operations/acf.954cc3b6-f1f4-46a2-832d-596ccb5a3d5a)  filename="gcp/gcp.go:1594"
WARN[0021] batch API enabling is running: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com] (op = operations/acf.954cc3b6-f1f4-46a2-832d-596ccb5a3d5a)  filename="gcp/gcp.go:1594"
WARN[0024] batch API enabling is running: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com] (op = operations/acf.954cc3b6-f1f4-46a2-832d-596ccb5a3d5a)  filename="gcp/gcp.go:1594"
WARN[0027] batch API enabling is running: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com] (op = operations/acf.954cc3b6-f1f4-46a2-832d-596ccb5a3d5a)  filename="gcp/gcp.go:1594"
WARN[0030] batch API enabling is running: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com] (op = operations/acf.954cc3b6-f1f4-46a2-832d-596ccb5a3d5a)  filename="gcp/gcp.go:1594"
INFO[0037] batch API enabling is completed: [deploymentmanager.googleapis.com servicemanagement.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com endpoints.googleapis.com file.googleapis.com ml.googleapis.com iam.googleapis.com sqladmin.googleapis.com]  filename="gcp/gcp.go:1590"
INFO[0037] reading from /home/user/folder/kubeflow/kubeflow-test/app.yaml  filename="coordinator/coordinator.go:341"
WARN[0037] could not open /home/user/.kube/config Error stat /home/user/.kube/config: no such file or directory  filename="apps/group.go:188"
WARN[0037] could not load config Error: open /home/user/.kube/config: no such file or directory  filename="apps/group.go:208"

创建了一些文件:

ls kubeflow-test/
app.yaml

一些检查:

kubectl config view
apiVersion: v1
clusters: []
contexts: []
current-context: ""
kind: Config
preferences: {}
users: []

我也尝试更新kubectl

gcloud components update kubectl
You have specified individual components to update.  If you are trying
 to install new components, use:
  $ gcloud components install kubectl
Do you want to run install instead (y/N)?  y
All components are up to date.

知道为什么我没有 ~/.kube/config 不存在吗?我尝试手动创建它,但后来我遇到了其他问题。我应该怎么做才能创建错误的配置。除了使用 CloudShell 在 GCP 上通过 cli 部署 Kubelow 之外,还有其他推荐方法吗?

GKE 有一个很棒的 description of how you can configure access to a cluster via kubectl(它使用 ~/.kube/config 文件作为存储凭据的默认位置)。有两种填充文件的方法:

  1. 使用gcloud container clusters create CLUSTER_NAME
  2. 通过命令行创建集群
  3. 正在使用 gcloud container clusters get-credentials CLUSTER_NAME
  4. 获取现有集群的凭据

从 Kubeflow 文档来看,初始化步骤似乎没有创建集群;应用步骤应该创建一个集群。如果您创建一个空文件,您不会描述您以后遇到的问题。

此外,您指向了部署 cli 文档,但 customizing Kubeflow on GKE 页面听起来更像是您要完成的工作,因此您可能会查看该页面是否回答了您的任何问题。