Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Pull data from html javascript nodejs?

Pull data from html javascript nodejs?

Problem

I'm working on making a CLI for google searches for myself, and I've been using nodejs for a little while working on a chatbot, so I'd like to get it working with nodejs. I can pull the data just fine, and end up with a string that has all the html data from the page. It's even easy to sort out in the html what the results that I want are:

League of Legends - Free Online Game | LoL - League of Legends
3 days ago … Official website for League of Legends. Join millions of players in an award winning Multiplayer Online Battle Arena.
Community - PVP.NET - General Discussion - Champions
leagueoflegends.com/ -
Cached
Similar
Mobile formatted
Options
LOL - Wikipedia, the free encyclopedia
LOL, an abbreviation for laughing out loud, or laugh out loud, is a common element of Internet slang. It was used historically on Usenet …
en.wikipedia.org/wiki/LOL -
Cached
Similar
Mobile formatted
Options

Anything that is .jd is a result, so I first need to separate those out, then working on getting the URLs and the Descriptions separated out, too. I've never done string manipulation to this extreme, so I've got no idea where to start with this.

Here's the html in a more readable format, realize though that I'm dealing with just one long string.

League of Legends - Free Online Game | LoL - League of Legends
3 days ago … Official website for League of Legends. Join millions of players in an award winning Multiplayer Online Battle Arena.
Community - PVP.NET - General Discussion - Champions
leagueoflegends.com/ -
Cached
Similar
Mobile formatted
Options
LOL - Wikipedia, the free encyclopedia
LOL, an abbreviation for laughing out loud, or laugh out loud, is a common element of Internet slang. It was used historically on Usenet …
en.wikipedia.org/wiki/LOL -
Cached
Similar
Mobile formatted
Options
Problem courtesy of: Rob

Solution

Luxun lead me back to the google API, and I found out how to search full web results instead of just included sites here: http://support.google.com/customsearch/bin/answer.py?hl=en&answer=1210656

To create a search engine that searches the entire web:

From the Google Custom Search homepage, click Create a Custom Search Engine.
Type a name and description for your search engine.
Under Define your search engine, in the Sites to Search box, enter at least one valid URL (e.g. www.google.com).
Select the CSE edition you want and accept the Terms of Service, then click Next. Select the layout option you want, and then click Next.
Click any of the links under the Next steps section to navigate to your Control panel.
In the left-hand menu, under Control Panel, click Basics.
In the Search Preferences section, select Search the entire web but emphasize included sites.
Click Save Changes.
In the left-hand menu, under Control Panel, click Sites.
Delete the site you entered during the initial setup process.
Solution courtesy of: Rob

Discussion

View additional discussion.



This post first appeared on Node.js Recipes, please read the originial post: here

Share the post

Pull data from html javascript nodejs?

×

Subscribe to Node.js Recipes

Get updates delivered right to your inbox!

Thank you for your subscription

×