Skip to main content

Hadoop file Management Tasks

 Implement the following file management tasks in Hadoop:

a) Adding files and directories

b) Retrieving files

c) Deleting files

Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them into HDFS using one of the above command line utilities.

Program:

 The most common file management tasks in Hadoop includes:

  • Adding files and directories to HDFS
  • Retrieving files from HDFS to local filesystem
  • Deleting files from HDFS

Hadoop file commands take the following form:

 

 

hadoop fs -cmd

Where cmd is the specific file command and <args> is a variable number of arguments. The command cmd is usually named after the corresponding Unix equivalent. For example, the command for listing files is ls as in Unix.

a) Adding Files and Directories to HDFS

Creating Directory in HDFS

 

 $ hadoop fs -mkdir foldername (syntax)

 $ hadoop fs -mkdir cse

Hadoop’s mkdir command automatically creates parent directories if they don’t already exist. Now that we have a working directory, we can put a file into it.

 

Adding File in HDFS

Create some text file on your local filesystem called example.txt. The Hadoop command put is used to copy files from the local system into HDFS.

Syntax

$hadoop fs -put source destination

 The command above is equivalent to:

 

 $ hadoop fs -put example.txt  cse

 

 b) Retrieving files from HDFS

  Hadoop command get gets the data from HDFS to local filesystem.

 Syntax

 $hadoop fs -get source destination

 $hadoop fs -get cse/example.txt  Desktop  (copies data from HDFS(cse/example.txt) to local file system(Desktop)

 c) Deleting Files

Hadoop command rm removes the files from HDFS.

$hadoop fs -rm cse/example.txt (removes file from HDFS)

Viva Questions

  1. What is command to list all the files in directory of HDFS?

Ans: $hadoop fs -ls cse (let cse be directory)

  1. What is command to copy data from local file system to HDFS using copyFromLocal?

Ans: $hadoop fs -copyFromLocal <source> <destination> (it is similar to put command except that source is restricted to local file reference)

  1. What is command to copy data from HDFS to local file system using copyToLocal?

Ans: $hadoop fs -copyToLocal <source> <destination> (it is similar to get command except that destination is restricted to local file reference)

  1. What is command to display contents of file in HDFS?

Ans: $hadoop fs -cat cse/A.java

  1. What is command to display last 1KB of particular file in HDFS?

Ans: $hadoop fs -tail <filename>

  1. What is command used to change replication factor for files or directories in HDFS?

Ans : $hadoop fs -setrep -w <value> <filename or directory> (-w flag requests that the command waits for the replication process to get completed.)

  1. What is command to show disk usage in bytes for all files/directories of path in HDFS?

Ans: $hadoop fs -du <path>

  1. What is command used to display free space in HDFS?

Ans: $hadoop fs -df -h

  1. What is command used to create new file at the path containing the current time as a timestamp in HDFS?

Ans: $hadoop fs -touchz <path>

  1.  What is command used to take source file from HDFS and outputs the given file in text format?

Ans: $hadoop fs -text <source>

  1.  What is command used to display information about the path?

Ans: $hadoop fs -stat <path>

  1.  What is command used to apply permissions to file in HDFS?

Ans: $hadoop fs -chmod <value> <file or directory>(for eg : value=777 where owner,group and others can read,write & execute read=4,write=2,execute=1)

  1. What is command used to counts the number of directories, number of files present and bytes under the path?

Ans: $hadoop fs -count <path>

Sample Output

1            3            1050120 cse (1 is directory, 3 is no of files , 1050120 are no of bytes & cse is directory)

  1. What is command used to get usage of particular command?

Ans: $hadoop fs -usage <commandname>

       15. What is command used to empty trash?

              Ans: $hadoop fs -expunge

Comments

Popular posts from this blog

Big Data Analytics Programs

  List of Programs for Big Data Analytics   CLICK ON ME 1.  Implement the following Data structures in Java       a)  Linked Lists            b)   Stacks       c)  Queues     d)   Set            e)   Map 2.  Perform setting up and Installing Hadoop in its three operating modes:      Standalone,     Pseudo distributed,     Fully distributed. 3.  Implement the following file management tasks in Hadoop:    a) Adding files and directories    b) Retrieving files    c) Deleting files 4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm. 5. Write a Map Reduce program that mines weather data.     Weather sensors collecting data every hour at many locations across the globe gather a large volume of log data, which is a ...

How to Install Parrot Operating System in Virtual Box using OVA

Step by Step Process of Parrot OS Installation What is Parrot OS Parrot is a free and open-source Linux system based on Debian that is popular among security researchers, security experts, developers, and privacy-conscious users. It comes with cyber security and digital forensics arsenal that is totally portable. It also includes everything you'll need to make your own apps and protect your online privacy. Parrot is offered in Home and Security Editions, as well as a virtual machine and a Docker image, featuring the KDE and Mate desktop environments. Features of Parrot OS The following are some of the features of Parrot OS that set it apart from other Debian distributions: Tor, Tor chat, I2P, Anonsurf, and Zulu Crypt, which are popular among developers, security researchers, and privacy-conscious individuals, are included as pre-installed development, forensics, and anonymity applications. It has a separate "Forensics Mode" that does not mount any of the system's hard...

Word Count Map Reduce program

  Aim: Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm   Program: Source Code import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration;// provides access to configuration parameters import org.apache.hadoop.fs.Path;// Path class names a file or directory in a HDFS import org.apache.hadoop.io.IntWritable;// primtive Writable Wrapper class for integers. import org.apache.hadoop.io.Text;// This class stores text and provides methods to serialize, deserialize, and compare texts at byte level import org.apache.hadoop.mapreduce.Job;//Job class allows the user to configure the job, submit it, control its execution, and query the state //The Hadoop Map-Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job import org.apache.hadoop.mapreduce.Mapper;//Maps input key/value pairs to a set of intermediate key/value pairs. import org.apache.hado...