Standalone Spark on VMware vSphere with a Machine Learning Test Run
Standalone Spark on VMware vSphere with a…
This demo shows a VMware vSphere environment with 16 virtual machines hosting the Apache Spark distributed platform in standalone mode. A test run (using the Spark Perf test suite) is started that executes a Machine Learning linear regression algorithm to train a model on a dataset with 10,000 features and 1 million examples. This test executes across all the virtual machines in the cluster and finishes in 3 minutes 49 seconds (wall clock time). A key measure of 40 seconds of model training time is achieved. This shows that Spark and MLlib execute well on a vSphere environment and that can Spark can be used there by data scientists and data engineers in standalone mode as well as in YARN mode.