Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Parsing With Nokogiri

I was reading an article from our blog about Extracting all the links from a webpage with python.Have a look , that's a well written article.So ,I decided to write an article about extracting links and images links with Ruby using Nokogiri .

What's Nokogiri ?

Nokogiri is a library that acts as HTML/XML parser. In simple language, if you want to extract a piece of information from a website to use it in your program what would you do?  Suppose we want to extract the information in
to use it in our program,either I will copy the source of website into a text file manualy  and  then search through the whole document or I can use a library that can help me in extracting the information directly from the website.Nokogiri is one such library.

Using Nokogiri

 Step 1.       Install the gem 'nokogiri' by typing  "gem install nokogiri" .

 Step 2.                                                                                                                                        
Include the library in your program by typing "require 'nokogiri'".Also include the 'open-uri' library by typing   " require 'open-uri' " as we will be dealing with the website.

 Step 3.       
Now we will open the page and with the help of css selector we will look for tag and then we will   pick out whats inside 'href' that will be the  link.Same we will do for obtaining an image too.
Have a look at the complete code (Explanation in comments):

Run it by typing:  $ ruby nokogiri.rb  ,on your terminal.

Thank you!     

This post first appeared on TECKGUIDE, please read the originial post: here

Share the post

Parsing With Nokogiri


Subscribe to Teckguide

Get updates delivered right to your inbox!

Thank you for your subscription