top-10-hadoop-cluster-management-tools

1. Ambari

Apache Ambari features a Hadoop management web user interface backed by a set of RESTful APIs. It is built to help system admins provision, manage, and monitor Hadoop clusters. As of this writing, Ambari supports the following Hadoop components: HBase, HCatalog, HDFS, Hive, MapReduce, OOzie, Pig, Sqoop, and ZooKeeper.

2. Ankush

Developed by Impetus Technologies, Ankush provides a common interface for both local and cloud-based clusters. It is designed to support different vendor clusters, including Oracle NoSQL databases, Hadoop, Apache, and Cloudera.

3. StarCluster

StarCluster is an open source cluster-computing toolkit designed to automate the process of building, configuring, and managing clusters of virtual machines on Amazon EC2. It is the go-to cluster management tool of Qubole, a company that offers Hadoop as a Service (HaaS).

4. Crowbar and Cloudera Manager

Dell Crowbar is a platform designed for provisioning and deploying servers from bare metal. It was originally developed to support Dell solutions that ran on OpenStack and Hadoop. Cloudera Manager, on the other hand, provides centralized management and simplifies the deployment of the entire Hadoop stack. In the Dell | Cloudera Solution for Apache Hadoop, Crowbar is used to provision the hardware, configure it, and install RHE Linux and Cloudera Manager. After that Cloudera Manager takes over to build a Hadoop cluster.

5. Intel Manager

Intel Manager is a proprietary solution that comes with Intel’s Hadoop distribution. It features a cluster management wizard and offers automated configuration through Intel Active Tuner.

6. Mesos

Like Ambari, Mesos is part of the Apache Software Foundation. It performs resource isolation and sharing across disparate frameworks. One interesting feature of Mesos is that it can run various applications like Hadoop, MPI, Hypertable, and Spark on a shared pool of nodes.

7. Orchestrator

Zettaset Orchestrator is a distribution-agnostic cluster management platform. It puts emphasis on security and high-availability, even supporting RBAC (role-based access control) and Kerberos. Hence, it is ideal for regulatory compliance.

8. Platform

IBM Platform Computing is a suite of solutions designed for clusters, grids, and HPC clouds. It includes IBM Platform Cluster Manager, which automates provisioning, maintenance and monitoring.

9. Rocks

Rocks comes in two forms: an open source edition, known as Rocks, and an enterprise edition, known as Rocks+ a.k.a. StackIQ Cluster Manager. The enterprise edition is developed by StackIQ, whose co-founders are the same people who developed the original Rocks back at the San Diego Supercomputer Center.

10. Serengeti

Serengeti is an open source project spearheaded by VMWare. It is also part of VMWare’s vSphere Big Data Extensions, which extends the vSphere platform to support Apache Hadoop workloads. In effect, Serengeti allows admins to auto-deploy and manage Hadoop cluster on a virtualized environment.