Category Archives: Fully Distributed Mode

Hadoop Modes – Standalone, Pseudo Distributed & Fully Distributed

After going through the fundamentals of Hadoop, HDFS & MapReduce, its time we move on to install Hadoop on your system. Before the actual installation, it is important to understand the modes in which Hadoop can be run.

There are three modes in which Hadoop can be installed and run. They are:-

1. Standalone Mode
2. Pseudo Distributed Mode
3. Fully Distributed Mode

Standalone Mode

In this mode, there are no Hadoop Daemons (NameNode, DataNode, Secondary NameNode, JobTracker & TaskTracker) that are running in the background.

As a result you will,

  • Not have NameNode storing meta-data information.
  • Not have a DataNode, as there will be no HDFS. The file will be stored locally on the hard disk.
  • Not have a TaskTracker sending status reports the JobTracker.
  • Not have a JobTracker as there are no TaskTrackers to manage.

As the name suggests, everything in standalone mode runs in a single JVM (single machine). It is best suited when you want to test your program for bugs with small input (stored locally). It is also known as the LocalJobRunner mode.

Note – Only Standalone mode supports quick testing of incremental changes that you make to the code.

Pseudo Distributed Mode

This mode helps you mimic a multi-server installation on a single machine. Pseudo Distributed mode also runs on a single machine, but it has all the daemons running in a separate process. Also it will have the files stored on HDFS and not on the local machine. Thus in be seen as small scale implementation before you run on an actual cluster with 1000s of nodes.

Fully Distributed Mode

As the name suggests, this mode involves the code running on an actual Hadoop cluster. It is mode in which you see the actual power of Hadoop, when you run your code against a very large input on 1000s of servers.

It is always difficult to debug a MapReduce program as you have Mappers running on different machine with different piece of input. You can never know where the Mappers are going to run eventually. Also with large inputs, it is likely that the data will be irregular in its format.

Development Tip – So before you run your code on a real Hadoop cluster, following a few steps can make your life easy with MapReduce.

  1. Always Unit Test your code first.
  2. Start with a very small portion of the input.
  3. Test locally (in LocalJobRunner mode) with the small input.
  4. Test in Pseudo Distributed mode with daemons running (to view the performance of your code with the small input).
  5. Finally if you are satisfied with running you code and its performance, you are ready to run it in Fully Distributed Mode on a real cluster.

In the subsequent posts we will see how to install Hadoop in Standalone Mode, Pseudo Distributed Mode.


References

1. Hadoop – The Definitive Guide (Tom White)