Hadoop Interview Questions and Answers 2018

Hadoop is an Apache open source software frame work for distributed storage and distributed data processing of very large data sets on computer clusters constructed from commodity hardware. It is written in Java language. Hadoop is designed to scale up from single server to thousands of machines, each imparting local computation and storage. Hadoop is master a slave architecture where master comes to a decision on who should what and slave will do the real the work s and file to master. Hadoop is particularly used in Facebook, twitter, Amazon, yahoo and telecom.

Top Hadoop Interview Question and Answers: Below, we have covered detailed answers to the Hadoop Interview Questions Which will be helpful to freshers and experienced Professionals. All the best for your interview Preparation.

What is Hadoop?

Why use Hadoop?

What is InputSplit in Hadoop?

How many InputSplits is made by a Hadoop Framework?

What is Hadoop MapReduce?

What is Hadoop Streaming?

What kind of Hardware is best for Hadoop?

How Hadoop MapReduce works?

What is NameNode in Hadoop?

What is JobTracker in Hadoop?

How many daemon processes run on a Hadoop cluster?

Hadoop is comprised of five separate daemons. Each of these daemons runs in its own JVM.

Following 3 Daemons run on Master nodes

NameNode – This daemon stores and maintains the metadata for HDFS. The NameNode is the master server in Hadoop and manages the file system namespace and access to the files stored in the cluster.

Secondary NameNode – Secondary NameNode, isn’t a redundant daemon for the NameNode but instead provides period check pointing and housekeeping tasks.

JobTracker – Each cluster will have a single JobTracker that manages MapReduce jobs, distributes individual tasks to machines running the Task Tracker.

Following 2 Daemons run on each Slave nodes

DataNode – Stores actual HDFS data blocks. The datanode manages the storage attached to a node, of which there can be multiple nodes in a cluster. Each node storing data will have a datanode daemon running.

TaskTracker – It is Responsible for instantiating and monitoring individual Map and Reduce tasks i.e. TaskTracker per datanode performs the actual work

What are the functionalities of JobTracer?

What is a NameNode?

What is a datanode?

What is TaskTracker?

How JobTracker assign tasks to the TaskTracker?

What happens when a datanode fails?

What are the most common input formats defined in Hadoop?

What are the actions followed by Hadoop?

Hadoop performs following actions in Hadoop

Client application submit jobs to the job tracker
JobTracker communicates to the Namemode to determine data location
Near the data or with available slots JobTracker locates TaskTracker nodes
On chosen TaskTracker Nodes, it submits the work
When a task fails, Job tracker notify and decides what to do then.
The TaskTracker nodes are monitored by JobTracker

What is heartbeat in HDFS?

What is a Combiner?

What is Speculative Execution?

In Hadoop during Speculative Execution a certain number of duplicate tasks are launched. On different slave node, multiple copies of same map or reduce task can be executed using Speculative Execution. In simple words, if a particular drive is taking long time to complete a task, Hadoop will create a duplicate task on another disk. Disk that finish the task first are retained and disks that do not finish first are killed.

Hadoop Interview Question and Answers

What is Hadoop?

Why use Hadoop?

What is InputSplit in Hadoop?

How many InputSplits is made by a Hadoop Framework?

What is Hadoop MapReduce?

What is Hadoop Streaming?

What kind of Hardware is best for Hadoop?

How Hadoop MapReduce works?

What is NameNode in Hadoop?

What is JobTracker in Hadoop?

How many daemon processes run on a Hadoop cluster?

What are the functionalities of JobTracer?

What is a NameNode?

What is a datanode?

What is TaskTracker?

How JobTracker assign tasks to the TaskTracker?

What happens when a datanode fails?

What are the most common input formats defined in Hadoop?

What are the actions followed by Hadoop?

What is heartbeat in HDFS?

What is a Combiner?

What is Speculative Execution?

What are the basic parameters of a Mapper?

What is the function of MapReducer partitioner?

What is difference between an Input Split and HDFS Block?

What is WebDAV in Hadoop?

What is sqoop in Hadoop?

What is Sequencefileinputformat?

What does the Conf.setMapper Class do?

What is the purpose of RecordReader in Hadoop?

What is Distributed Cache in Hadoop?

How did you debug your Hadoop code?

Which directory does Hadoop install to?

Where are Hadoop’s configuration files located and list Them?

What is Hadoop?

Why use Hadoop?

What is InputSplit in Hadoop?

How many InputSplits is made by a Hadoop Framework?

What is Hadoop MapReduce?

What is Hadoop Streaming?

What kind of Hardware is best for Hadoop?

How Hadoop MapReduce works?

What is NameNode in Hadoop?

What is JobTracker in Hadoop?

How many daemon processes run on a Hadoop cluster?

What are the functionalities of JobTracer?

What is a NameNode?

What is a datanode?

What is TaskTracker?

How JobTracker assign tasks to the TaskTracker?

What happens when a datanode fails?

What are the most common input formats defined in Hadoop?

What are the actions followed by Hadoop?

What is heartbeat in HDFS?

What is a Combiner?

What is Speculative Execution?

What are the basic parameters of a Mapper?

What is the function of MapReducer partitioner?

What is difference between an Input Split and HDFS Block?

What is WebDAV in Hadoop?

What is sqoop in Hadoop?

What is Sequencefileinputformat?

What does the Conf.setMapper Class do?

What is the purpose of RecordReader in Hadoop?

What is Distributed Cache in Hadoop?

How did you debug your Hadoop code?

Which directory does Hadoop install to?

Where are Hadoop’s configuration files located and list Them?

Related Posts