HomeLab Stage XLV: New Monster Workstation

After completing the last Stage XLIV: DC2 Redesign: It was time to write a post about my new Custom Power Workstation.

I decided to build the system completely from scratch with high performance in mind. It started with an old Apple Mac Pro G5 case in my garage…. I cutted metal parts out of the case and put in upside down. I took pieces from an old Corsair ATX PC case and did created a Frankenstein case, which suits my requirements:

I wanted an Intel Xeon Processor, so I checked the ARK website and found my perfect fit: Xeon W – 2255 (10 Core 3,70GHz) with 48 PCIe lanes and DDR4-ECC memory support…

My mainboard choice was an ASUS C422 Pro / SE, which supports the CPU, my memory and had enough PCIe slots.

From other HomeLab projects I had several server ECC memory modules in stock. I placed 256GB RAM DDR4-2666 (8 x 32GB modules) onto the mainboard. There was only one problem with that mainboard: The two M2 NVMe slots where attached to the internal DMI switch which was only connected via one x4 lane…… I don´t want to use my NVMe SSDs in RAID0 with that bottleneck in place!

The solution for that issue was found with the Asus Hyper M2 x16 v4 card. I placed 4 x 512GB Samsung NVMe SSDs into the card and ordered an Intel VROC Standard chip. This chip enables the mainboard to utilize the CPU RAID feature set. Raid0 support for all 4 x NVMe (each is attached to one PCIe x4 lane).

The onboard M2 slots where used in single mode with 2 x 2TB Intel M2 SSDs. Both are not configured with any Raid setup.

The network part of this custom workstation was very tricky: I wanted minimum Dual 10GbE in LACP mode or 40GbE. I placed a Mellanox ConnectX-3 Pro Dual Port 40GbE NIC inside the machine, but it does not really work very well…… The Power output on the QSFP slots are very limited on this NIC type….

I deciced to order a Mellanox (Nvidia Networking) ConnectX-4 40GbE NIC. DAC cables with 15-20m aren´t available so I placed two existing Cisco 40GbE LC Gbics into the switch as well as into the NIC. That worked after flashing the newest Mellanox Firmware onto the card. You can flash an original firmware also on OEM cards with the force option 🙂

Next chapter is the GPU: I really wanted max out the machine. I decided to place 2 x Nvidia RTX8000 Active GPUs with NVlink bridge inside the system. 9216 CUDA Cores and 96GB DDR6 GPU memory….

To make it a low noise setup, I placed a Corsair CPU Water cooling system onto the mainboard and choosed a 1200W PSU. Both can be actively monitored and managed with the Corair App.

My existing Dual Dell 27“ 5K Displays are running fine, to I did not replace them for now. Dual 8K 32″ displays would be very nice……

The problems starts with the first boot of the monster…..

The NVMe RAID0 was not recognized at every boot. Why? After hours of troubleshooting I found out, that I am running out of PCIe lanes…… The CPU supports 48 the them, but each RTX8000 needs 16 and the NVMe Raid0 (4 M2 SSDs with x4 lanes) also 16…. No more lanes available for the NIC…. I removed the 4 x Samsung NVMe SSDs and switched the two Intel 2TB from the mainboard to the card. Now the M2 cards took 8 lanes, as well as the Mellanox NIC and each GPU took 16 lanes.

That system is an absolut monster from a performance perspective, but it is also low noise. It is placed under my desk and you can easily work 12 hours without notifiying it.

What happended to my “old” workstation? The 2013 MacPro?

Check out the next part here: HomeLab Stage XLVI: New ESXi Host MacPro