This guide will help you to install a single node Apache Hadoop cluster on your machine.
System Requirements
- Ubuntu 16.04
- Java 8 Installed
1. Download Hadoop
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.0/hadoop-2.7.0.tar.gz
2. Prepare for Installation
tar xfz hadoop-2.7.0.tar.gz
sudo mv hadoop-2.7.0 /usr/local/hadoop
3. Create Dedicated Group and User
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo
4. Switch to Newly Created User Account
su -hduser
5. Add Variables to ~/.bashrc
#Begin Hadoop Variables
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#End Hadoop Variables
6. Source ~/.bashrc
source ~/.bashrc
7. Set Java Home for Hadoop
- Open the file : /usr/local/hadoop/etc/hadoop/hadoop-env.sh
- Find and edit the line as :
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
8. Edit core-site.xml
- Open the file: /usr/local/hadoop/etc/hadoop/core-site.xml
- Add following lines between
… tags.
fs.default.name hdfs://localhost:9000
9. Edit yarn-site.xml
- Open the file: /usr/local/hadoop/etc/hadoop/yarn-site.xml
- Add following lines between
… tags.
yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler
10. Edit mapred-site.xml
- Copy the mapred-site.xml template first using:
cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapredsite.xml
- Open the file: /usr/local/hadoop/etc/hadoop/mapred-site.xml
- Add following lines between
… tags.
fs.default.name hdfs://localhost:9000
11. Edit hdfs-site.xml
First, we create following directories:
sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
sudo chown hduser:hadoop -R /usr/local/hadoop_store
sudo chmod 777 -R /usr/local/hadoop_store
Now open /usr/local/hadoop/etc/hadoop/hdfs-site.xml and enter the following content in between
the tag
dfs.replication 1 dfs.namenode.name.dir file:/usr/local/hadoop_store/hdfs/namenode dfs.datanode.data.dir file:/usr/local/hadoop_store/hdfs/datanode
12. Format NameNode
cd /usr/local/hadoop/
bin/hdfs namenode -format
13. Start Hadoop Daemons
cd /usr/local/hadoop/
sbin/start-dfs.sh
sbin/start-yarn.sh
14. Check Service Status
jps
15. Check Running Jobs
Type in browser’s address bar:
http://localhost:8088