Lately I had the opportunity to work with NVIDIA AI Enterprise (NVAIE) on vSphere with Tanzu (TKGS). There is plenty of online documentation available on both VMware’s & NVIDIA’s side, however I had a hard time finding the correct documentation to alter the necessary bits and pieces for my customer. Hence I aim to write it all down in a Step by Step approach.
Before we start: a huge thanks to the people at VMware & NVIDIA for providing assistance during the initial setup at our customer.
Table of Contents
What & Why?
So what is Proxy Caching a Repository on Harbor and why do we need it?
What is Harbor?
Harbor is an open source registry that secures artifacts with policies and role-based access control, ensures images are scanned and free from vulnerabilities, and signs images as trusted. Harbor, a CNCF Graduated project, delivers compliance, performance, and interoperability to help you consistently and securely manage artifacts across cloud native compute platforms like Kubernetes and Docker.
Source: Official Harbor Website
What is Proxy Cache?
Proxy cache allows you to use Harbor to proxy and cache images from a target public or private registry. As of Harbor v2.1.1, the proxy cache feature was updated to align with Docker Hub’s rate limit policy. If you plan to use proxy cache with your Harbor instance, it is strongly recommended that you use v2.1.1 or later to avoid being rate limited.
You can use a proxy cache to pull images from a target Harbor or non-Harbor registry in an environment with limited or no access to the internet. You can also use a proxy cache to limit the amount of requests made to a public registry, avoiding consuming too much bandwidth or being throttled by the registry server.
When a pull request comes to a proxy cache project, if the image is not cached, Harbor pulls the image from the target registry and serves the pull command as if it is a local image from the proxy cache project. The proxy cache project then caches the image for a future request.
Source: Official Harbor Documentation
The customer wants the following:
- The capability to leverage NVIDIA AI Enterprise on vSphere with Tanzu in an Air Gapped scenario
- The capability to control which Public Repositories are available to the Developers on each Kubernetes Cluster
- The capability to leverage Harbor as the Single Point of Contact for requesting Container Images from all the needed repositories
- The capability to consolidate the RBAC (Role Based Access Control) in Harbor and not have to share credentials or create credentials for each Developer on external Repositories.
This means that ideally Harbor is used to leverage the Container Images for NVIDIA AI Enterprise to the Air Gapped vSphere with Tanzu environment. Those Container Images are hosted on the official NVIDIA Repository https://nvcr.io .
Let’s dive in!
How to do it?
2. Let’s go to Harbor and create a New Endpoint as shown below:
|Name||<Something meaningful for you> (NVIDIA)|
|Description||<Something meaningful for you> (NVIDIA Global Catalog Endpoint)|
|Access Secret||<Your API Key>|
|Verify Remote Cert||Yes|
3. Click on ‘Test Connection‘ and Save your Endpoint by clicking ‘OK‘.
So we’ve got our NVIDIA Global Catalog Endpoint now; let’s create a Project for it in Harbor so our Developers can consume NVIDIA Containers from our Harbor!
4. Still in Harbor, create a New Project and make sure to highlight ‘Proxy Cache’ and select our ‘NVIDIA’ Endpoint as shown below:
|Project Name||<Something meaningful for you> nvidiaio|
|Access Level||<Something meaningful for you> Public|
|Storage Quota||<Something meaningful for you> -1 (unlimited)|
Your new Project should look like this:
Great! Let’s use the Proxy Cached Harbor Repository!
How to use our Harbor Proxy Cached Repository?
That’s quite simple! You need to add the following as a prefix to any NVIDIA Global Catalog Image Tag:
For NVIDIA AI Enterprise Images this means the following:
<YOUR_HARBOR_SERVER_NAME>/<YOUR_HARBOR_NVIDIAIO_PROJECT>/nvidia/cloud-native/gpu-operator-1-3 Example: harbor.potus.local/nvidiaio/nvidia/cloud-native/gpu-operator-1-3
So how does that look in real life?
operator: repository: harbor.potus.local/nvidiaio/nvaie image: gpu-operator-1-3 # If version is not specified, then default is to use chart.AppVersion #version: "" imagePullPolicy: IfNotPresent imagePullSecrets:  priorityClassName: system-node-critical defaultRuntime: containerd runtimeClass: nvidia use_ocp_driver_toolkit: false # cleanup CRD on chart un-install cleanupCRD: false
That’s it! I hope it helped!
Note how now your 1 NVIDIA Global Catalog API Key is used to provide access to all the NVIDIA Global Catalog Images for all your Developers on your (Tanzu) Kubernetes Clusters.
If you have any questions or comments, don’t hesitate to let us know!
Have a nice day!