Skip to main content
  1. Posts/

使用 Kind 部署 Kubeflow 完整版

·591 字·3 分钟· 0 · 0 ·
k8s kubeflow k8s 教程
Table of Contents

kubeflow 本地部署选项:

方式 特点
kind kind 是一个使用 Docker 容器节点运行本地 Kubernetes 集群的工具1。您可以使用 kind 在本地部署 Kubeflow,但需要注意以下几点:kind 主要是为了测试 Kubernetes 本身而设计的,可能不适合生产环境1。kind 需要安装 Docker 和 kubectl1。kind 需要使用 kfctl 作为 CLI 来配置和管理 Kubeflow2。kind 需要从阿里云镜像仓库拉取 Kubeflow 镜像34。
K3s k3s 是一个轻量级的 Kubernetes 发行版,可以在低资源的设备上运行1。您可以使用 k3s 在本地部署 Kubeflow,但需要注意以下几点:k3s 需要安装 kubectl 和 kustomize1。k3s 需要使用 kubeflow-pipelines-standalone-1.7.0.tar.gz 文件来部署 Kubeflow Pipelines1。k3s 需要从阿里云镜像仓库拉取 Kubeflow 镜像2。
k3ai k3ai 是一个快速安装 Kubernetes 和 Kubeflow Pipelines 的工具,支持 NVIDIA GPU 和 TensorFlow Serving1。您可以使用 k3ai 在本地部署 Kubeflow,但需要注意以下几点:k3ai 需要安装 curl 和 kubectl1。k3ai 需要使用 k3ai-cli 命令来部署 Kubeflow Pipelines1。k3ai 目前只支持 Kubeflow Pipelines,其他 Kubeflow 组件的支持还在开发中1。
安装 kind>

安装 kind #

根据官方教程:https://github.com/kubernetes-sigs/kind

什么是kind:>

什么是kind: #

kind是使用Docker容器“节点”运行本地Kubernetes集群的工具。kind主要用于测试Kubernetes本身,但可用于本地开发或CI。

安装kind>

安装kind #

If you have go ( 1.17+) and docker installed go install sigs.k8s.io/kind@v0.17.0 && kind create cluster is all you need!

我们需要安装 go python docker 在机器上

安装 docker 环境>

安装 docker 环境 #

Docker: Index of linux/ubuntu/dists/bionic/pool/stable/amd64/

sudo dpkg -i *.deb

apt list -a docker-ce docker-ce-cli containerd.io

安装 GO>

安装 GO #

如果Ubuntu的包管理器无法提供最新版本的Go环境,则可以通过从官方网站下载二进制文件来安装。

  1. 打开网站 https://golang.org/dl/ 并下载对应版本的Go二进制文件。
  2. 打开终端,创建一个新的文件夹并将下载的文件移动到该文件夹中:
mkdir ~/go
cd ~/go
wget https://golang.org/dl/go<version>.linux-amd64.tar.gz
  1. 将下载的文件解压到/usr/local目录中:
sudo tar -C /usr/local -xzf go<version>.linux-amd64.tar.gz
  1. 设置Go环境变量。将以下行添加到您的~/.profile文件中:
export GOPATH=$HOME/go
export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
  1. 使环境变量生效:
source ~/.profile
  1. 验证Go是否成功安装。可以通过运行以下命令来检查:
go version

如果成功安装,将会输出当前安装的Go版本信息。

安装 Python>

安装 Python #

python apt install python

安装 kubeflow-manifests>

安装 kubeflow-manifests #

clone repo:https://github.com/shikanon/kubeflow-manifests

命令$ kind create cluster --config=kind/kind-config.yaml --name=kubeflow --image=kindest/node:v1.17.17

Config 更新

apiVersion: kind.x-k8s.io/v1alpha4                                                                                                              
kind: Cluster                                                                                                                                   
nodes:                                                                                                                                          
- role: control-plane                                                                                                                           
 extraPortMappings:                                                                                                                            
 - containerPort: 30000                                                                                                                        
   hostPort: 30000                                                                                                                             
   listenAddress: "0.0.0.0" # Optional, defaults to "0.0.0.0"                                                                                  
   protocol: tcp # Optional, defaults to tcp                                                                                                   
 kubeadmConfigPatches:                                                                                                                         
 - |                                                                                                                                           
   kind: InitConfiguration                                                                                                                     
   nodeRegistration:                                                                                                                           
     kubeletExtraArgs:                                                                                                                         
       node-labels: "ingress-ready=true"                                                                                                       
containerdConfigPatches:                                                                                                                        
- |-                                                                                                                                            
 [plugins."io.containerd.grpc.v1.cri".registry.mirrors."192.168.13.214:5000"]                                                                  
   endpoint = ["http://192.168.13.214:5000"]    
dest/node:v1.17.17
Creating cluster "kubeflow" ...
 ✓ Ensuring node image (kindest/node:v1.17.17) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kubeflow"
You can now use your cluster with:

kubectl cluster-info --context kind-kubeflow

使用kubectl cluster-info --context kind-kubeflow,输出:

Kubernetes control plane is running at https://127.0.0.1:33185
KubeDNS is running at https://127.0.0.1:33185/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

效果

NAME                                                        READY   STATUS              RESTARTS   AGE
admission-webhook-deployment-6fb9d65887-xx5q6               1/1     Running             0          8m50s
cache-deployer-deployment-7558d65bf4-bqcd5                  0/2     PodInitializing     0          6m44s
cache-server-67d98b4ddd-dfc4z                               0/2     Init:0/1            0          2m9s
centraldashboard-7b7676d8bd-j5gf8                           1/1     Running             0          8m55s
jupyter-web-app-deployment-56ddd6d488-fp72h                 1/1     Running             0          2m11s
katib-controller-77675c88df-js456                           1/1     Running             0          9m4s
katib-db-manager-646695754f-96t8q                           1/1     Running             1          9m6s
katib-mysql-5bb5bd9957-6r8cx                                1/1     Running             0          9m4s
katib-ui-55fd4bd6f9-hh9k5                                   1/1     Running             0          9m3s
kfserving-controller-manager-0                              0/2     ContainerCreating   0          6m34s
kubeflow-pipelines-profile-controller-579696986b-c64zw      1/1     Running             0          2m11s
metacontroller-0                                            1/1     Running             0          6m58s
metadata-envoy-deployment-76d65977f7-nmdw2                  1/1     Running             0          6m42s
metadata-grpc-deployment-697d9c6c67-rm7qc                   0/2     PodInitializing     0          6m43s
metadata-writer-58cdd57678-scxcd                            0/2     PodInitializing     0          6m41s
minio-544ff5d5d-nwnht                                       0/2     Init:0/1            0          2m12s
ml-pipeline-85fc99f899-5dbnh                                0/2     PodInitializing     0          6m41s
ml-pipeline-persistenceagent-65cb9594c7-jvqch               0/2     Init:0/1            0          6m41s
ml-pipeline-scheduledworkflow-7f8d8dfc69-pp5dr              0/2     Init:0/1            0          6m41s
ml-pipeline-ui-5c765cc7bd-2m2rt                             0/2     Init:0/1            0          6m40s
ml-pipeline-viewer-crd-5b8df7f458-4fh2m                     0/2     Init:0/1            0          6m40s
ml-pipeline-visualizationserver-56c5ff68d5-6jqbp            0/2     Init:0/1            0          6m38s
mpi-operator-789f88879-9z255                                1/1     Running             0          8m36s
mxnet-operator-7fff864957-wxtww                             1/1     Running             0          9m8s
mysql-bdfb4f675-kx45b                                       0/2     Init:0/1            0          6m39s
notebook-controller-deployment-74d9584477-95vvc             1/1     Running             0          8m28s
profiles-deployment-67b4666796-dq8wh                        2/2     Running             0          8m18s
pytorch-operator-fd86f7694-tmb5r                            0/2     PodInitializing     0          8m46s
tensorboard-controller-controller-manager-fd6bcffb4-sxfbj   0/3     PodInitializing     0          7m55s
tensorboards-web-app-deployment-5465d687b9-d45gz            1/1     Running             0          2m10s
tf-job-operator-7bc5cf4cc7-4hr7v                            1/1     Running             0          8m57s
volumes-web-app-deployment-76bfd6d6fc-hqlj8                 1/1     Running             0          2m10s
workflow-controller-5449754fb4-ggzpp                        0/2     Init:0/1            0          2m10s
xgboost-operator-deployment-5c7bfd57cc-s4bv8                0/2     PodInitializing     0          8m57s

所有 POD,running 即可

默认 pvc>

默认 pvc #

vim pvc.yaml

vim pvc.yaml


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kubeflow-test-pv
  namespace: kubeflow-user-example-com
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 128Mi
      
kubectl apply -f pvc.yaml
建立 experiment>

建立 experiment #

secret “mlpipe-line-minio-artifact” not found>

secret “mlpipe-line-minio-artifact” not found #

cd patch
kubectl delete -f pipeline-env-platform-agnostic-multi-user.yaml
kubectl apply -f pipeline-env-platform-agnostic-multi-user.yaml

cd manifest1.3
kubectl delete -f  017-pipeline-env-platform-agnostic-multi-user.yaml
kubectl apply -f  017-pipeline-env-platform-agnostic-multi-user.yaml
kubelet MountVolume.SetUp failed for volume “docker-sock”>

kubelet MountVolume.SetUp failed for volume “docker-sock” #

cd patch
vi  workflow-controller.yaml 

修改 containerRuntimeExecutor: k8sapi 为 containerRuntimeExecutor: pns

kubectl delete -f  workflow-controller.yaml 
kubectl apply -f  workflow-controller.yaml