Hadoop Mode of Operations

Aim:

(i) Perform setting up and Installing Hadoop in its three operating modes:

Standalone,

Pseudo distributed,

Fully distributed

(ii)Use web-based tools to monitor your Hadoop setup

Ans) It is an open-source framework written in java that is used to store, analyze and process huge amounts of data in a distributed environment across clusters of computers in an efficient manner. It provides the capability to process distributed data using a simplified programming model. It is used by Google, Facebook, yahoo, youtube, Twitter, etc. It is developed by Doug Cutting at Yahoo in 2006 which is inspired by Google File System and Google Map Reduce algorithm. It is a file system provided by Linux to store the data.

Operational modes of configuring Hadoop cluster

Hadoop can be run in one of the three supported modes

1) Local(Standalone) mode-By default, Hadoop is configured to run in a single-node, non-distributed mode, as a single Java process. This is useful for debugging. The usage of this mode is very limited and it can be only used for experimentation.

2) Pseudo-Distributed mode-Hadoop is run on a single node in a pseudo-distributed mode where each Hadoop daemon(Namenode, Datanode, Secondary Namenode, Jobtracker, Tassktracker) runs in a separate Java process. In Local mode, Hadoop runs as a single Java process

3) Fully distributed mode- In this mode, all daemons are executed in separate nodes forming a multi-node cluster. This setup offers true distributed computing capability and offers built-in reliability, scalability, and fault tolerance

Standalone mode

1) Add Java software information to repository

$ sudo add-apt-repository ppa:webupd8team/java

2) Update repository

$ sudo apt-get update

3) Install Java 8

$ sudo apt-get install oracle-java8-installer

4) Verify which java version is installed

$ java -version

5) Install Hadoop-2.8.1

$ wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz

(or) $ wget http://www-us.apache.org/dist/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz

6 ) Extract tar.gz file and hadoop-2.8.1 folder is created

$ tar -zxvf hadoop-2.8.1.tar.gz

7) Ensure HADOOP_HOME is correctly set in .bashrc file

export HADOOP_HOME=hadoop-2.8.1

export PATH=$PATH:$HADOOP_HOME/bin

8) Evaluate .bashrc file

$ source ~/.bashrc

9) Verify hadoop is working or not by issuing the following command

$ hadoop version

Pseudo-distributed mode

1) Configure Hadoop

Hadoop configuration files can be found in $HADOOP_HOME/etc/hadoop. In order to develop hadoop programs in java, location of java must be set in hadoop-env.sh file

export $JAVA_HOME=/usr/lib/jvm/jdk1.8.0_131

2) Several files needed to configure Hadoop located in $HADOOP_HOME/etc/hadoop are described below

a) core-site.xml (contains configuration settings that hadoop uses when started. It contains information such as the port number used for Hadoop instance, memory allocated for the file system, memory limit for storing the data, and size of Read/Write buffers. It also specifies where Namenode runs in the cluster.

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>

<description>name of default file system. URI is used to specify hostname, port number for file system.</description>

</property>

</configuration>

b) hdfs-site.xml (contains information such as the value of replication data, namenode path, and datanode paths of local file systems)

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

<description> actual number of block replications(copies) can be specified when file is created.</description>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>hadoop-2.8.1/namenodenew</value>

<description> directory for namenode</description>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>hadoop-2.8.1/datanodenew</value>

<description>directory for data node</description>

</property>

</configuration>

1.3.Format namenode

$ hdfs namenode -format

1.4. start single node cluster

$ start-dfs.sh

1.5. check hadoop daemons are running or not by using jps(java virtual machine process status tool)

$ jps

13136 DataNode

13427 SecondaryNameNode

12916 NameNode

13578 Jps

6. Access hadoop on browser by http://localhost:50070 (50070 is default port number to access hadoop on browser)

Fully-distributed mode

Steps to be followed on configuring master and slave nodes

1.Create same account on all nodes to use hadoop installation

$ sudo useradd cse (cse is username)

$ sudo passwd cse (Enter password for cse)

2.Edit /etc/hosts file on all nodes which specifies IP address of each node followed by hostnames(assumption) and add the following lines

192.168.100.22 master

192.168.100.23 slave1

192.168.100.24 slave2

3) Install SSH on all nodes

$ sudo apt-get install openssh-server

4) Configure key based login on all nodes which communicate with each other without prompting for password

$ su cse (super user switching to cse account)

$ ssh-keygen -t rsa -P “” (public key is generated)

$ ssh-copy-id -i /home/cse/.ssh/id_rsa.pub cse@slave1 (copy public key from master to slave nodes)

$ ssh-copy-id -i /home/cse/.ssh/id_rsa.pub cse@slave2

$ exit

Installing Hadoop on Master nodes

5.$ sudo mkdir /usr/local/hadoop (create hadoop folder)

$wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
Extract tar.gz file and hadoop-2.8.1 is created

$ tar -zxvf hadoop-2.8.1.tar.gz

8. sudo mv hadoop-2.8.1 /usr/local/hadoop (move hadoop installation folder to newly created directory)

9. sudo chown -R cse /usr/local/hadoop/hadoop-2.8.1 (making cse owner of hadoop folder)

Configuring Hadoop on master node

10.

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://master:54310</value>

<description>name of default file system. URI is used to specify hostname, port number for file system.</description>

</property>

</configuration>

b) hdfs-site.xml (contains information such as the value of replication data, namenode path, and datanode paths of local file systems)

<configuration>

<property>

<name>dfs.replication</name>

<value>3</value>

<description> actual number of block replications(copies) can be specified when file is created.</description>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/usr/local/hadoop/hadoop-2.8.1/namenodenew</value>

<description> directory for namenode</description>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/usr/local/hadoop/hadoop-2.8.1/datanodenew</value>

<description>directory for data node</description>

</property>

</configuration>

11) Ensure HADOOP_HOME is correctly set in .bashrc file

export HADOOP_HOME=/usr/local/hadoop/hadoop-2.8.1

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

12) Evaluate ~/.bashrc file

$ source ~/.bashrc

13) Hadoop configuration files can be found in $HADOOP_HOME/etc/hadoop. In order to develop hadoop programs in java, location of java must be set in hadoop-env.sh file

export $JAVA_HOME=/usr/lib/jvm/jdk1.8.0_131

14) update remaining configuration files in master node

$ sudo gedit slaves ($HADOOP_HOME/etc/hadoop folder)

slave1

slave2

$ sudo gedit masters

master

15) transfer hadoop folder from master node to slave nodes

$ scp -r /usr/local/hadoop cse@slave1:/home/cse

$ scp -r /usr/local/hadoop cse@slave2:/home/cse

16) format namenode on master node

$ hdfs namenode -format

17) start hadoop cluster on master node

$ start-dfs.sh

18) Verify hadoop daemon on slave nodes or master nodes using jps

$ jps

19) access hadoop on browser on slave nodes using http://master:50070

CHARVIK

Search This Blog

Hadoop Mode of Operations

Comments

Popular posts from this blog

Machine Learning Lab Internal Questions

Static Member Functions

How to Install Parrot Operating System in Virtual Box using OVA