Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

BigData Apache Hadoop HDFS Interview QnA - II

This is the second post in continuation to my previous post on Apache Hadoop Hdfs related interview questions. I  have covered almost 20 questions in this post.

Q:How data or file is written into Hadoop HDFS?
A: In order to write a file in HDFS:
- A client needs a handle to master i.e. Namenode (master). 
- The namenode provides the address of the datanodes (slaves) on which client will start writing the data. 
- The client directly writes data on the datanodes by creating data write pipeline.

Q: What should be the block size in Hadoop, ideally?
A: There is no hard rule in nailing down the block size in Hadoop. It boils down to what size is the input data. If input data is huge then the block size of 128/256 MB is good to have for optimized performance. While dealing with small files, small block sizes are recommended. This is done by using parameter dfs.block.size in order to override the default block size value.
Few points to remember:

- With larger block is being processed and some failure occur more work need to be done
- Fewer blocks if the block size is larger. 
- Fewer blocks mean fewer nodes hence reduced throughput for parallel access
- Makes it possible for client to read/write more data without interacting with the Namenode, saving time.
Larger blocks reduce metadata size of the Namenode, reducing Namenode load.
Having fewer & larger blocks, also means longer tasks which in turn may not gain maximum parallelism

Q:What is Heartbeat in Hadoop?
A:

Q:How often DataNode send heartbeat to NameNode in Hadoop?
A:

Q:While starting Hadoop services, DataNode service is not running?
A:

Q:How HDFS helps NameNode in scaling in Hadoop?
A:

Q:What is Secondary NameNode in Hadoop HDFS?
A:

Q:Ideally what should be the replication factor in Hadoop?
A:

Q:How one can change Replication factor when Data is already stored in HDFS
A:

Q:Why HDFS performs replication, although it results in data redundancy in Hadoop?
A:

Q:What is Safemode in Apache Hadoop?
A:

Q:What happen when namenode enters in safemode in hadoop?
A:

Q:How to remove safemode of namenode forcefully in HDFS?
A:

Q:How to create the directory when Name node is in safe mode?
A:

Q:Why can we not create directory   /user/dataflair/inpdata001 when Name node is in safe mode?
A:

Q:What is difference between a MapReduce InputSplit and HDFS block
A:

Q:Explain Small File Problem in Hadoop
A:

Q:What is the difference between HDFS and NAS?
A:

Q:How to create Users in hadoop HDFS?
A:

Q:What Happens When NameNode Goes down during File Read Operation in Hadoop?
A:

Q:Explain HDFS “Write once Read many” pattern
A:

Q:Can multiple clients write into an HDFS file concurrently in hadoop?
A:

Q:Does HDFS allow a client to read a file which is already opened for writing in hadoop?
A:

Q:What should be the HDFS Block size to get maximum performance from Hadoop cluster?
A:

Q:Why HDFS stores data using commodity hardware despite the higher chance of failures in 
hadoop?
A:

Q:Who divides the file into Block while storing inside hdfs in hadoop?
A:

Q:What is active and passive NameNode in HDFS?
A:

Q:How is indexing done in hadoop HDFS?
A:

Q:What is rack awareness in Hadoop?
A:

Q:What is Erasure Coding in Hadoop?
A:

Q:When and how to create hadoop   archive?
A:

Q:What is non-dfs used in HDFS web console
A:

Q:How does HDFS ensure Data Integrity of data blocks stored in Hadoop HDFS?
A:

Q:Why slaves limited to 4000 in Hadoop Version1?
A:


This post first appeared on Interview Questions On Java,Java EE, please read the originial post: here

Share the post

BigData Apache Hadoop HDFS Interview QnA - II

×

Subscribe to Interview Questions On Java,java Ee

Get updates delivered right to your inbox!

Thank you for your subscription

×