Skip to main content

Hadoop Mode of Operations

 Aim:

        (i) Perform setting up and Installing Hadoop in its three operating modes:

Standalone,

Pseudo distributed,

Fully distributed

        (ii)Use web-based tools to monitor your Hadoop setup


Ans) It is an open-source framework written in java that is used to store, analyze and process huge amounts of data in a distributed environment across clusters of computers in an efficient manner. It provides the capability to process distributed data using a simplified programming model. It is used by Google, Facebook, yahoo, youtube, Twitter, etc. It is developed by Doug Cutting at Yahoo in 2006 which is inspired by Google File System and Google Map Reduce algorithm. It is a file system provided by Linux to store the data.

Operational modes of configuring Hadoop cluster

Hadoop can be run in one of the three supported modes

 1) Local(Standalone) mode-By default, Hadoop is configured to run in a single-node, non-distributed mode, as a single Java process. This is useful for debugging. The usage of this mode is very limited and it can be only used for experimentation.

 2) Pseudo-Distributed mode-Hadoop is run on a single node in a pseudo-distributed mode where each Hadoop daemon(Namenode, Datanode, Secondary Namenode, Jobtracker, Tassktracker) runs in a separate Java process. In Local mode, Hadoop runs as a single Java process

  3) Fully distributed mode- In this mode, all daemons are executed in separate nodes forming a multi-node cluster. This setup offers true distributed computing capability and offers built-in reliability, scalability, and fault tolerance

Standalone mode

           1) Add Java software information to repository

               $ sudo add-apt-repository ppa:webupd8team/java

           2)  Update repository

                  $ sudo apt-get update

           3)  Install Java 8

                 $ sudo apt-get install oracle-java8-installer

         4)   Verify which java version is installed

                  $ java -version

        5)   Install Hadoop-2.8.1

                  $ wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz

                 (or)    $ wget http://www-us.apache.org/dist/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz

            6 )   Extract tar.gz file and hadoop-2.8.1 folder is created

        $ tar -zxvf hadoop-2.8.1.tar.gz

            7)    Ensure HADOOP_HOME is correctly set in .bashrc file              

                   export HADOOP_HOME=hadoop-2.8.1

                   export PATH=$PATH:$HADOOP_HOME/bin

            8)   Evaluate .bashrc file

                  $ source ~/.bashrc   

            9)   Verify hadoop is working or not by issuing the following command

                   $ hadoop version

 

Pseudo-distributed mode

          

            1)    Configure Hadoop

                    Hadoop configuration files can be found in $HADOOP_HOME/etc/hadoop. In order to develop hadoop programs in java, location of java must be set in hadoop-env.sh file

                    export  $JAVA_HOME=/usr/lib/jvm/jdk1.8.0_131

           2)     Several files needed to configure Hadoop located in $HADOOP_HOME/etc/hadoop are described below

            a) core-site.xml (contains configuration settings that hadoop uses when started. It contains information such as the port number used for Hadoop instance, memory allocated for the file system, memory limit for storing the data, and size of Read/Write buffers. It also specifies where Namenode runs in the cluster.

                      <configuration>

                            <property>

                               <name>fs.default.name</name>

                               <value>hdfs://localhost:54310</value>

                               <description>name of default file system. URI is used to specify hostname, port number for file system.</description>

                            </property>

                      </configuration>

    b) hdfs-site.xml (contains information such as the value of replication data, namenode path, and datanode paths of local file systems)

                               <configuration>

                                    <property>

                                       <name>dfs.replication</name>

                                       <value>1</value>

                                       <description> actual  number of block replications(copies) can be specified when file is created.</description>

                                     </property>

                                     <property>

                                         <name>dfs.namenode.name.dir</name>

                                         <value>hadoop-2.8.1/namenodenew</value>

                                         <description> directory for namenode</description>

                                    </property>

                                    <property>

                                          <name>dfs.datanode.data.dir</name>

                                           <value>hadoop-2.8.1/datanodenew</value>

                                           <description>directory for data node</description>

                                     </property>

                               </configuration>

1.3.Format namenode

                       $ hdfs namenode -format

1.4.  start single node cluster

  $ start-dfs.sh

1.5. check hadoop daemons are running or not by using jps(java virtual machine process status tool)

 

  $ jps

 13136 DataNode

 

13427 SecondaryNameNode

 

12916 NameNode

 

13578 Jps

 

6.  Access hadoop on browser by http://localhost:50070  (50070 is default port number to access hadoop on browser)

 

Fully-distributed mode

Steps to be followed on configuring master and slave nodes

1.Create same account on all nodes to use hadoop installation

                            $ sudo useradd cse (cse is username)

                            $ sudo passwd cse (Enter password for cse)

2.Edit /etc/hosts file on all nodes which specifies IP address of each node followed by hostnames(assumption) and add the following lines

           192.168.100.22            master

           192.168.100.23            slave1

          192.168.100.24                         slave2

3) Install SSH on all nodes

     $ sudo apt-get install openssh-server

4) Configure key based login on all nodes which communicate with each other without prompting for password

                      $ su cse (super user switching to cse account)

                      $ ssh-keygen -t rsa  -P  “” (public key is generated)

                      $ ssh-copy-id -i /home/cse/.ssh/id_rsa.pub cse@slave1 (copy public key from master to slave nodes)

           $ ssh-copy-id -i /home/cse/.ssh/id_rsa.pub cse@slave2

           $ exit

Installing Hadoop on Master nodes

   5.$ sudo mkdir /usr/local/hadoop (create hadoop folder)

  1. $wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
  2. Extract tar.gz file and hadoop-2.8.1 is created

            $ tar -zxvf hadoop-2.8.1.tar.gz

8.      sudo mv hadoop-2.8.1 /usr/local/hadoop (move hadoop installation folder to newly created directory)

9.      sudo chown -R cse /usr/local/hadoop/hadoop-2.8.1 (making cse owner of hadoop folder)

           Configuring Hadoop on master node

             10.

                            a)  core-site.xml (contains configuration settings that hadoop uses when started. It contains information such as the port number used for Hadoop instance, memory allocated for the file system, memory limit for storing the data, and size of Read/Write buffers. It also specifies where Namenode runs in the cluster.

                      <configuration>

                            <property>

                               <name>fs.default.name</name>

                               <value>hdfs://master:54310</value>

                               <description>name of default file system. URI is used to specify hostname, port number for file system.</description>

                            </property>

                      </configuration>

    b) hdfs-site.xml (contains information such as the value of replication data, namenode path, and datanode paths of local file systems)

                               <configuration>

                                    <property>

                                       <name>dfs.replication</name>

                                       <value>3</value>

                                       <description> actual  number of block replications(copies) can be specified when file is created.</description>

                                     </property>

                                     <property>

                                         <name>dfs.namenode.name.dir</name>

                                         <value>/usr/local/hadoop/hadoop-2.8.1/namenodenew</value>

                                         <description> directory for namenode</description>

                                    </property>

                                    <property>

                                          <name>dfs.datanode.data.dir</name>

                                           <value>/usr/local/hadoop/hadoop-2.8.1/datanodenew</value>

                                           <description>directory for data node</description>

                                     </property>

                                 </configuration>

         11) Ensure HADOOP_HOME is correctly set in .bashrc file              

                    export HADOOP_HOME=/usr/local/hadoop/hadoop-2.8.1

                    export PATH=$PATH:$HADOOP_HOME/bin

                    export PATH=$PATH:$HADOOP_HOME/sbin

         12)  Evaluate ~/.bashrc file

                $ source ~/.bashrc

      13)  Hadoop configuration files can be found in $HADOOP_HOME/etc/hadoop. In order to develop hadoop programs in java, location of java must be set in hadoop-env.sh file

                  export  $JAVA_HOME=/usr/lib/jvm/jdk1.8.0_131

        14) update remaining configuration files in master node

              $ sudo gedit slaves  ($HADOOP_HOME/etc/hadoop folder)

                slave1

                slave2

              $ sudo gedit masters

                master

        15) transfer hadoop folder from master node to slave nodes

                $ scp -r /usr/local/hadoop cse@slave1:/home/cse

                $ scp -r /usr/local/hadoop cse@slave2:/home/cse

        16)  format namenode on master node

                  $ hdfs namenode -format

        17)  start hadoop cluster on master node

                $ start-dfs.sh

         18)  Verify hadoop daemon on slave nodes or master nodes using jps

                 $ jps

         19)  access hadoop on browser on slave nodes using http://master:50070

Comments

Popular posts from this blog

Big Data Analytics Programs

  List of Programs for Big Data Analytics   CLICK ON ME 1.  Implement the following Data structures in Java       a)  Linked Lists            b)   Stacks       c)  Queues     d)   Set            e)   Map 2.  Perform setting up and Installing Hadoop in its three operating modes:      Standalone,     Pseudo distributed,     Fully distributed. 3.  Implement the following file management tasks in Hadoop:    a) Adding files and directories    b) Retrieving files    c) Deleting files 4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm. 5. Write a Map Reduce program that mines weather data.     Weather sensors collecting data every hour at many locations across the globe gather a large volume of log data, which is a good candidate for analysis with MapReduce since it is semi-structured and record-oriented. 6. Implement Matrix Multiplication with Hadoop Map Reduce 7. Write a MapReduce program to count the occurrence of similar words in a file. Use partitioner to part

How to Install Parrot Operating System in Virtual Box using OVA

Step by Step Process of Parrot OS Installation What is Parrot OS Parrot is a free and open-source Linux system based on Debian that is popular among security researchers, security experts, developers, and privacy-conscious users. It comes with cyber security and digital forensics arsenal that is totally portable. It also includes everything you'll need to make your own apps and protect your online privacy. Parrot is offered in Home and Security Editions, as well as a virtual machine and a Docker image, featuring the KDE and Mate desktop environments. Features of Parrot OS The following are some of the features of Parrot OS that set it apart from other Debian distributions: Tor, Tor chat, I2P, Anonsurf, and Zulu Crypt, which are popular among developers, security researchers, and privacy-conscious individuals, are included as pre-installed development, forensics, and anonymity applications. It has a separate "Forensics Mode" that does not mount any of the system's hard

Word Count Map Reduce program

  Aim: Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm   Program: Source Code import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration;// provides access to configuration parameters import org.apache.hadoop.fs.Path;// Path class names a file or directory in a HDFS import org.apache.hadoop.io.IntWritable;// primtive Writable Wrapper class for integers. import org.apache.hadoop.io.Text;// This class stores text and provides methods to serialize, deserialize, and compare texts at byte level import org.apache.hadoop.mapreduce.Job;//Job class allows the user to configure the job, submit it, control its execution, and query the state //The Hadoop Map-Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job import org.apache.hadoop.mapreduce.Mapper;//Maps input key/value pairs to a set of intermediate key/value pairs. import org.apache.hadoop.mapred