Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Extracting Important words from a sentence using Node

Extracting Important words from a sentence using Node

Problem

I admit that I havent searched extensively in the SO database. I tried reading the natural npm package but doesnt seem to provide the feature. I would like to know if the below requirement is somewhat possible ?

I have a database that has list of all cities of a country. I also have rating of these cities (best place to live, worst place to live, best rated city, worsrt rated city etc..). Now from the User interface, I would like to enable the user to enter free text and from there I should be able to search my database.

For e.g Best place to live in California or places near California or places in California

From the above sentence, I want to extract the nouns only (may be ) as this will be name of the city or country that I can search for.

Then extract 'best' means I can sort is a particular order etc...

Any suggestions or directions to look for?

I risk a chance that the question will be marked as 'debatable'. But the reason I posted is to get some direction to proceed.

Problem courtesy of: Vaya

Solution

[I came across this question whilst looking for some use cases to test a module I'm working on. Obviously the question is a little old, but since my module addresses the question I thought I might as well add some information here for future searchers.]

You should be able to do what you want with a POS chunker. I've recently released one for Node that is modelled on chunkers provided by the NLTK (Python) and Standford NLP (Java) libraries (the chunk() and TokensRegex() methods, resepectively).

The module processes strings that already contain parts-of-speech, so first you'll need to run your text through a parts-of-speech tagger, such as pos:

var pos = require('pos');

var words = new pos.Lexer().lex('Best place to live in California');
var tags = new pos.Tagger()
  .tag(words)
  .map(function(tag){return tag[0] + '/' + tag[1];})
  .join(' ');

This will give you:

Best/JJS place/NN to/TO live/VB in/IN California/NNP ./.

Now you can use pos-chunker to find all proper nouns:

var chunker = require('pos-chunker');

var places = chunker.chunk(tags, '[{ tag: NNP }]');

This will give you:

Best/JJS place/NN to/TO live/VB in/IN {California/NNP} ./.

Similarly you could extract verbs to understand what people want to do ('live', 'swim', 'eat', etc.):

var verbs = chunker.chunk(tags, '[{ tag: VB }]');

Which would yield:

Best/JJS place/NN to/TO {live/VB} in/IN California/NNP ./.

You can also match words, sequences of words and tags, use lookahead, group sequences together to create chunks (and then match on those), and other such things.

Solution courtesy of: Mark Birbeck

Discussion

View additional discussion.



This post first appeared on Node.js Recipes, please read the originial post: here

Share the post

Extracting Important words from a sentence using Node

×

Subscribe to Node.js Recipes

Get updates delivered right to your inbox!

Thank you for your subscription

×