Hadoop Cluster Commissioning and Decommissioning Nodes

To add new nodes to the cluster:

1. Add the network addresses of the new nodes to the include file.

hdfs-site.xml

<name>dfs.hosts</name>

<value>/<hadoop-home>/conf/includes</value>

</property>

mapred-site.xml

<name>mapred.hosts</name>

<value>/<hadoop-home>/conf/includes</value>

</property>

Datanodes that are permitted to connect to the namenode are specified in a

file whose name is specified by the dfs.hosts property.

Includes file resides on the namenode’s local filesystem, and it contains a line for each datanode, specified by network address (as reported by the datanode; you can see what this is by looking at the namenode’s web UI). If you need to specify multiple network addresses for a datanode, put them on one line, separated by whitespace.

eg :

slave01

slave02

slave03

…..

Similarly, tasktrackers that may connect to the jobtracker are specified in a file whose name is specified by the mapred.hosts property. In most cases, there is one shared file, referred to as the include file, that both dfs.hosts and mapred.hosts refer to, since nodes in the cluster run both datanode and tasktracker daemons.

2. Update the namenode with the new set of permitted datanodes using this

command:

% hadoop dfsadmin –refreshNodes

3. Update the jobtracker with the new set of permitted tasktrackers using this command:

% hadoop mradmin –refreshNodes

4. Update the slaves file with the new nodes, so that they are included in future

operations performed by the Hadoop control scripts.

5. Start the new datanodes and tasktrackers.

6. Check that the new datanodes and tasktrackers appear in the web UI.

To remove nodes from the cluster:

1. Add the network addresses of the nodes to be decommissioned to the exclude file. Do not update the include file at this point.

hdfs-site.xml

<name>dfs.hosts.exclude</name>

<value>/<hadoop-home>/conf/excludes</value>

</property>

mapred-site.xml

<name>mapred.hosts.exclude </name>

<value>/<hadoop-home>/conf/excludes</value>

</property>

The decommissioning process is controlled by an exclude file, which for HDFS is set by the dfs.hosts.exclude property and for MapReduce by the mapred.hosts.exclude property. It is often the case that these properties refer to the same file. The exclude file lists the nodes that are not permitted to connect to the cluster.

2. Update the namenode with the new set of permitted datanodes, using this

command:

% hadoop dfsadmin –refreshNodes

3. Update the jobtracker with the new set of permitted tasktrackers using this command:

% hadoop mradmin –refreshNodes

4. Go to the web UI and check whether the admin state has changed to “Decommission In Progress” for the datanodes being decommissioned. They will start copying their blocks to other datanodes in the cluster.

5. When all the datanodes report their state as “Decommissioned,” all the blocks have been replicated. Shut down the decommissioned nodes.

6. Remove the nodes from the include file, and run:

% hadoop dfsadmin -refreshNodes

% hadoop mradmin –refreshNodes

7. Remove the nodes from the slaves file.