Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Identification of Spiders and Crawlers

final year ns2 ieee projects

Identification of Spiders and Crawlers

Identification of Spiders and Crawlers:

Spiders are small web programs that harvest information for search engines. These spiders tracks the websites. In some ways these are good by quickly showing up the websites. These programs follow certain links on the web and gather information. You can also explicitly instruct a robot not to follow any of the links on the page. Like the good spiders, bad spiders are also present known as spam spiders. These bad spiders try to harvest your email address. Some spiders may not work efficiently and goes in endless loops which are built by dynamically created webpages. So in this project we try to identify the bad spam spiders present in the webpages and try to eradicate them. And also we minimize the bot traffic. This idea was firstly proposed by Google namely Google Analytics.

Implementation Steps done:

Software used are Java+Hadoop+Hive (NoSQL database -Hive is used)

The given dataset (Google bot-spider) is analyzed for bot identification

The data is uploaded to Hadoop HDFS system

The location of file is stored under hdfs/app/hadoop

The file name is given web_log

We have to start the hadoop server first.

Then we can check the hadoop is running or not.

Upload the web_log file to hive database

We created server_log partition in hive, where are data are stored

Start Analysis, in which the dataset in hive is analyzed for bot detetction

Finally results of bot under different browser is taken and plotted as graph.

This show how many bot urls are detected in the web log

Tools used: Hive, Hadoop, Java

Project Demo


The post Identification of Spiders and Crawlers appeared first on Final Year Project / NS2 Project / NS3 Project / Hadoop / Mapreduce / Bigdata Project / IOT Internet of things project.

Share the post

Identification of Spiders and Crawlers


Subscribe to Ieee Project | Ns2 & Ns3 Project | Hadoop & Bigdata | Android Project – We Provides Ieee Projects For Latest Technologies In Java,j2ee,dotnet,ns2,ns3,hadoop,bigdata,android And Also Provide Real Time Projects Training In Chennai.

Get updates delivered right to your inbox!

Thank you for your subscription