My entire HomeLab is running fine for several years. Why change or modify it? The answer is very simple: My customers are doing it all the time…. I want to be able to test out the newest technology and I always want to be one step ahead of customers, from an experience perspective.
I was able to acquire two Dell R740xd servers from one of my friends 🙂
Each server is equipped with Dual Intel Xeon Platinum 6168 CPUs with 24 Cores and 2.7Ghz. A lot of power, especially in combination with the 1.5TB of memory!
Storage (Controllers & Disks)
I swapped the Dell Perc H730P controller for a Dell HBA330 (to fully support vSAN!).
I have pumped each server with 3 x Intel Optane DC4800DC cards for vSAN cache. I wanted 3 Diskgroups within each host to use the nested fault domains feature for my new 2-Node vSAN cluster. Each Diskgroup has one Intel Optane 375GB Caching Device and 2 x 3,84 TB SAS SSDs.
The server itself boots from a 200GB SSD. Each server has a 1.6TB SSD for vSAN Direct K8S stuff and 2 x 7.6 TB SSDs for the Dell Data Domain VMs.
The onBoard network card (4 x copper) is also replaced with the 4 x 10GbE SFP+ version.
The vSAN network as well as the vMotion network is using a dual port 100GbE NIC (Mellanox Connectx-5)
But what about GPUs? Yes of course! Each server is configured with a Nvdia RTX8000. 48GB of FrameBuffer, fully supported for vGPU and Nvidia AI Enterprise.
I am using the active version of the card, so I am able to disable the fan management inside the server for that slot. That helped a lot to make the machine quiet….
vGPU Config for the Nvidia RTX8000 (Active)
vGPU profiles and hardware overview
Both servers are performing a 2-Node vSAN with 100GbE network connection.
vSAN Disk Groups for “normal OSA vSAN” and vSAN Direct Configuration. (I am still waiting for 4 additional 3.84 TB SSDs to setup the third Disk Group)
vSAN Fault Domains (for the 2 Node config)
2 x Witness Appliances (for faster recovery in case of a failure / only “Change witness host” is needed
When my additional 4 SSDs have arrived, the third DiskGroup will be created to use the new vSAN Nested Fault Domain feature. Double redundancy (data factor 4x) within a 2 node cluster!!
The brand new feature of vSphere 8.0 DPU support is also planned within my environment. Actually I am trying to get 2 x Nvidia Bluefield-2 NICs (DPUs) for my HomeLab. Officially only the Dell R750 servers are supported, but you know me….. I will get it working inside my machines!
The are several different hardware options available from Nvidia, only a very few of them will work as VMware DPUs. Only P-Core models, only 32GB DDR4 models etc….
vDS Configuration for my setup:
The extra DPU vDS is prepared, the DPUs are ordered….. I will keep you guys updated, when everything is up and running…..
Actual Rack View
Dell has replaced the OMIVV with the new OMEVV integration. This is based on the OpenManage Enterprise and operates as a plugin. New licenses are required inside your servers to utilize the OMEVV as well as the Power Manager plugin within OpenManage. You need the Advanced + license! I will create a blog post about the OMEVV and Power Manager in the future.
Stay tuned for the next episodes of my HomeLab journey….
Looking for the next episode? Look no further: HomeLab Stage LXVII: Horizon with vGPU