Scaling Distributed Machine Learning with Bitfusion on Kubernetes
Distributed machine learning across multiple nodes can be effectively used for training. In this demo we show the use of vSphere Bitfusion to scale out workloads across multiple Kubernetes nodes with minimum loss in performance. The results showed the effectiveness of sharing GPU across jobs with minimal loss of performance. VMware Bitfusion makes distributed training scalable across physical resources and makes it limitless from a GPU resources capability.