使用 Kind 部署 Kubeflow 完整版
Table of Contents
kubeflow 本地部署选项:
方式 | 特点 |
---|---|
kind | kind 是一个使用 Docker 容器节点运行本地 Kubernetes 集群的工具1。您可以使用 kind 在本地部署 Kubeflow,但需要注意以下几点:kind 主要是为了测试 Kubernetes 本身而设计的,可能不适合生产环境1。kind 需要安装 Docker 和 kubectl1。kind 需要使用 kfctl 作为 CLI 来配置和管理 Kubeflow2。kind 需要从阿里云镜像仓库拉取 Kubeflow 镜像34。 |
K3s | k3s 是一个轻量级的 Kubernetes 发行版,可以在低资源的设备上运行1。您可以使用 k3s 在本地部署 Kubeflow,但需要注意以下几点:k3s 需要安装 kubectl 和 kustomize1。k3s 需要使用 kubeflow-pipelines-standalone-1.7.0.tar.gz 文件来部署 Kubeflow Pipelines1。k3s 需要从阿里云镜像仓库拉取 Kubeflow 镜像2。 |
k3ai | k3ai 是一个快速安装 Kubernetes 和 Kubeflow Pipelines 的工具,支持 NVIDIA GPU 和 TensorFlow Serving1。您可以使用 k3ai 在本地部署 Kubeflow,但需要注意以下几点:k3ai 需要安装 curl 和 kubectl1。k3ai 需要使用 k3ai-cli 命令来部署 Kubeflow Pipelines1。k3ai 目前只支持 Kubeflow Pipelines,其他 Kubeflow 组件的支持还在开发中1。 |
安装 kind #
根据官方教程:https://github.com/kubernetes-sigs/kind
什么是kind: #
kind是使用Docker容器“节点”运行本地Kubernetes集群的工具。kind主要用于测试Kubernetes本身,但可用于本地开发或CI。
安装kind #
If you have go ( 1.17+) and docker installed
go install sigs.k8s.io/kind@v0.17.0 && kind create cluster
is all you need!
我们需要安装 go python docker 在机器上
安装 docker 环境 #
Docker: Index of linux/ubuntu/dists/bionic/pool/stable/amd64/
sudo dpkg -i *.deb
apt list -a docker-ce docker-ce-cli containerd.io
安装 GO #
如果Ubuntu的包管理器无法提供最新版本的Go环境,则可以通过从官方网站下载二进制文件来安装。
- 打开网站 https://golang.org/dl/ 并下载对应版本的Go二进制文件。
- 打开终端,创建一个新的文件夹并将下载的文件移动到该文件夹中:
mkdir ~/go
cd ~/go
wget https://golang.org/dl/go<version>.linux-amd64.tar.gz
- 将下载的文件解压到/usr/local目录中:
sudo tar -C /usr/local -xzf go<version>.linux-amd64.tar.gz
- 设置Go环境变量。将以下行添加到您的~/.profile文件中:
export GOPATH=$HOME/go
export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
- 使环境变量生效:
source ~/.profile
- 验证Go是否成功安装。可以通过运行以下命令来检查:
go version
如果成功安装,将会输出当前安装的Go版本信息。
安装 Python #
python apt install python
安装 kubeflow-manifests #
clone repo:https://github.com/shikanon/kubeflow-manifests
命令$ kind create cluster --config=kind/kind-config.yaml --name=kubeflow --image=kindest/node:v1.17.17
Config 更新
apiVersion: kind.x-k8s.io/v1alpha4 kind: Cluster nodes: - role: control-plane extraPortMappings: - containerPort: 30000 hostPort: 30000 listenAddress: "0.0.0.0" # Optional, defaults to "0.0.0.0" protocol: tcp # Optional, defaults to tcp kubeadmConfigPatches: - | kind: InitConfiguration nodeRegistration: kubeletExtraArgs: node-labels: "ingress-ready=true" containerdConfigPatches: - |- [plugins."io.containerd.grpc.v1.cri".registry.mirrors."192.168.13.214:5000"] endpoint = ["http://192.168.13.214:5000"]
dest/node:v1.17.17
Creating cluster "kubeflow" ...
✓ Ensuring node image (kindest/node:v1.17.17) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kubeflow"
You can now use your cluster with:
kubectl cluster-info --context kind-kubeflow
使用kubectl cluster-info --context kind-kubeflow
,输出:
Kubernetes control plane is running at https://127.0.0.1:33185
KubeDNS is running at https://127.0.0.1:33185/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
效果
NAME READY STATUS RESTARTS AGE
admission-webhook-deployment-6fb9d65887-xx5q6 1/1 Running 0 8m50s
cache-deployer-deployment-7558d65bf4-bqcd5 0/2 PodInitializing 0 6m44s
cache-server-67d98b4ddd-dfc4z 0/2 Init:0/1 0 2m9s
centraldashboard-7b7676d8bd-j5gf8 1/1 Running 0 8m55s
jupyter-web-app-deployment-56ddd6d488-fp72h 1/1 Running 0 2m11s
katib-controller-77675c88df-js456 1/1 Running 0 9m4s
katib-db-manager-646695754f-96t8q 1/1 Running 1 9m6s
katib-mysql-5bb5bd9957-6r8cx 1/1 Running 0 9m4s
katib-ui-55fd4bd6f9-hh9k5 1/1 Running 0 9m3s
kfserving-controller-manager-0 0/2 ContainerCreating 0 6m34s
kubeflow-pipelines-profile-controller-579696986b-c64zw 1/1 Running 0 2m11s
metacontroller-0 1/1 Running 0 6m58s
metadata-envoy-deployment-76d65977f7-nmdw2 1/1 Running 0 6m42s
metadata-grpc-deployment-697d9c6c67-rm7qc 0/2 PodInitializing 0 6m43s
metadata-writer-58cdd57678-scxcd 0/2 PodInitializing 0 6m41s
minio-544ff5d5d-nwnht 0/2 Init:0/1 0 2m12s
ml-pipeline-85fc99f899-5dbnh 0/2 PodInitializing 0 6m41s
ml-pipeline-persistenceagent-65cb9594c7-jvqch 0/2 Init:0/1 0 6m41s
ml-pipeline-scheduledworkflow-7f8d8dfc69-pp5dr 0/2 Init:0/1 0 6m41s
ml-pipeline-ui-5c765cc7bd-2m2rt 0/2 Init:0/1 0 6m40s
ml-pipeline-viewer-crd-5b8df7f458-4fh2m 0/2 Init:0/1 0 6m40s
ml-pipeline-visualizationserver-56c5ff68d5-6jqbp 0/2 Init:0/1 0 6m38s
mpi-operator-789f88879-9z255 1/1 Running 0 8m36s
mxnet-operator-7fff864957-wxtww 1/1 Running 0 9m8s
mysql-bdfb4f675-kx45b 0/2 Init:0/1 0 6m39s
notebook-controller-deployment-74d9584477-95vvc 1/1 Running 0 8m28s
profiles-deployment-67b4666796-dq8wh 2/2 Running 0 8m18s
pytorch-operator-fd86f7694-tmb5r 0/2 PodInitializing 0 8m46s
tensorboard-controller-controller-manager-fd6bcffb4-sxfbj 0/3 PodInitializing 0 7m55s
tensorboards-web-app-deployment-5465d687b9-d45gz 1/1 Running 0 2m10s
tf-job-operator-7bc5cf4cc7-4hr7v 1/1 Running 0 8m57s
volumes-web-app-deployment-76bfd6d6fc-hqlj8 1/1 Running 0 2m10s
workflow-controller-5449754fb4-ggzpp 0/2 Init:0/1 0 2m10s
xgboost-operator-deployment-5c7bfd57cc-s4bv8 0/2 PodInitializing 0 8m57s
所有 POD,running 即可
默认 pvc #
vim pvc.yaml
vim pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kubeflow-test-pv
namespace: kubeflow-user-example-com
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 128Mi
kubectl apply -f pvc.yaml
建立 experiment #
secret “mlpipe-line-minio-artifact” not found #
cd patch
kubectl delete -f pipeline-env-platform-agnostic-multi-user.yaml
kubectl apply -f pipeline-env-platform-agnostic-multi-user.yaml
cd manifest1.3
kubectl delete -f 017-pipeline-env-platform-agnostic-multi-user.yaml
kubectl apply -f 017-pipeline-env-platform-agnostic-multi-user.yaml
kubelet MountVolume.SetUp failed for volume “docker-sock” #
cd patch
vi workflow-controller.yaml
修改 containerRuntimeExecutor: k8sapi 为 containerRuntimeExecutor: pns
kubectl delete -f workflow-controller.yaml
kubectl apply -f workflow-controller.yaml