Open Source Serengeti Enables VMware for Hadoop
Page 1 of 1
VMware this week took the wraps off Project Serengeti, an open source effort that enables Hadoop to run on virtual infrastructure. Additionally, VMware is also contributing open source bits to the upstream Apache Hadoop project, to enable Hadoop to be more responsive and efficient when running on virtualization technologies.
"Project Serengeti is about simplifying deployment of Hadoop on VMware," Fausto Ibarra, senior director of Product Management at VMware, told InternetNews.com. "With Serengeti you can have a fully functional Hadoop deployment in as little as 10 minutes."
With Serengeti, Hadoop can run on a regular VMware vSphere deployment and then take advantage of all the usual VMware management and availability tools. Hadoop itself is packaged as a standard OVF (Open Virtualization Format) machine image file, just like any other VMware image.
The way Serengeti is deployed on vSphere is that the OVF will create two virtual machines. One is the Serengeti server; the other is the master virtual machine that will be cloned to create Hadoop nodes.
"The user connects into the Serengeti Server virtual machine, and that's where they run all the commands to create a cluster," Ibarra said. "The master virtual machine is then cloned and configured, and after that you have a fully functionally Hadoop cluster."