Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Crawling with "npm crawler"

Crawling with "npm crawler"

Problem

For example i what to crawl description of Node.js modules from npmjs.org.
but this code doesn't work. and how it made this with jQuery, but not with jsdom module.

var Crawler = require("crawler").Crawler;
var crawler = new Crawler({
   "maxConnections":10,
});

crawler.queue([{
"uri":"https://npmjs.org/package/crawler",

"callback":function(error,result) {
    console.log("description:", window.$("p.description").text());
    }
}]);
Problem courtesy of: khex

Solution

your code exists too early. Add a setTimeout on the last line to give enough time for your code to complete.

then call process.exit() from your callback function.

the Crawler callback takes 3 parameters, the 3rd one being jQuery, so you probably use something like so:

"callback":function(error,result,$) {
  console.log("description:",$("p.description").text());
}
Solution courtesy of: Pascal Belloncle

Discussion

View additional discussion.



This post first appeared on Node.js Recipes, please read the originial post: here

Share the post

Crawling with "npm crawler"

×

Subscribe to Node.js Recipes

Get updates delivered right to your inbox!

Thank you for your subscription

×