Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

HDFS Basic File system Commands

Overview:

The file system shell includes various shell like comands that interact with HDFS and other file systems that Hadoop supports like local FS, HFTP FS, S3 FS and others.
Hadoop can be run on the below three different modes:
  1. Local/Standalone mode
  2. Pseudo Distributed mode
  3. Fully Distributed mode


1. Local/Standalone mode:

Standalone mode is the default mode which hadoop runs. It is mainly used for debugging where HDFS is not needed.There is no need of any custom configuration in the file mapred-site.xml,core-site.xml,hdfs-site.xml.

2.Pseudo-Distributed mode:

This is also known and Single-node cluster where both namenode and datanode runs on the same machine.All the hadoop daemons will be running on a single node.This configuration is mainly used while testing where there is no need to think about the resources and other users sharing the resource.In this mode a seperate JVM will be used for every hadoop component as they could communicate across network sockets which helps in producing the fully functioning and optimized mini-cluster in a single host.

3. Fully Distributed mode:

This is the production mode of hadoop where multiple nodes will be running. Data will be distributed across several nodes and processing will be done on each node.Master and slave services will be running on separate nodes.

File system commands:

All the below commands are executed in Ubuntu environment installed in a VM virtualbox.

Start-all.sh

   To start all the daemons in hadoop start-all.sh command is used



    To see the daemons running in hadoop jps command is used.


Hadoop version

To see the version of the hadoop currently running the command Hadoop vesrion is used.


 Hadoop fsck

fsck is the command to get the health of a file system.
hadoop fsck /   
It gives the health of the hadoop distributed file system which starts from the root directorty /

appendToFile

append single source or multiple source files from local file system to destination file system.


appending Mydata.txt present in local Directory to abc.txt present in Hdfs

Cat

Displays the content of a file on stdout.


 CopyFromLocal

Copies the files from Local source system to HDFS.


CopyToLocal

Copies the files from HDFS to Loal source file system.


Count

Displays the count of the number of directories,files and bytes under the given path.


The output columns with -count -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME. The -h option shows sizes in human readable format.


cp

Copy files from source to destination. It also allows multiple source files to get copied in which case the destination must be a directory.


The -f option will overwrite the destination if it already exist. The -p otion will preserve the file attributes(timestamps,ownership,permissions etc)

du

Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.


The -s option displays aggregate summary of file lengths being displayed, rather than the individual files
The -h option displays in human readable format (e.g 64.0m instead of 67108864)

dus

Displays the summary of file lengths. This command is deprecated.It is same as using hdfs dfs -du -s

expunge

Empties the trash.


 ls

displays the list of the files and directories along with their attributes. 


-R option can be used to return the stat recursively through the directory structure.


mkdir

Creates a directory in the HDFS path provided.

The -p option behavior is much like Unix mkdir -p, creating parent directories along the path.

moveFromLocal

Similar to put command, except that the source localsrc is deleted after it's copied.


mv

Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted.

put

Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.


rm

Delete files specified as args.


The -f option will not display a diagnostic message or modify the exit status to reflect an error if the file does not exist.
The -R option deletes the directory and any content under it recursively.
The -r option is equivalent to -R.
The -skipTrash option will bypass trash, if enabled, and delete the specified file(s) immediately. This can be useful when it is necessary to delete files from an over-quota directory.

setrep

Changes the replication factor of a file. If path is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.


The -w flag requests that the command wait for the replication to complete. This can potentially take a very long time.
The -R flag is accepted for backwards compatibility. It has no effect.

stat

Returns the stat information on the path.


tail

Displays last kilobyte of the file to stdout.


touchz

Create a file of zero length.





This post first appeared on Big Data Basics, please read the originial post: here

Share the post

HDFS Basic File system Commands

×

Subscribe to Big Data Basics

Get updates delivered right to your inbox!

Thank you for your subscription

×