Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Extending MongoDB findOne() functionality to findAnother()

Given a new MongoDB instance to work with, its often difficult to understand the structure since, unlike SQL, the database is inherently schema-less. A popular initial tool is Variety.js, which finds all possible fields and tells you the types, occurrences and percentages of them in the db.

However, it is often useful to see the contents of the fields as well. For this, you can use findOne(), which grabs what seems to be the last inserted record from the collection and pretty-prints it to the terminal. This is great, but sometimes you want to see more than one record to get a better feel for field contents. In this post I’ll show you how I extended Mongo to do this.

(Assuming you’ve already installed mongo and git – I use 

brew install mongo
 from the homebrew package manager and git came with XCode on my Mac)

First off, let’s get a copy of mongo’s code:

mruttley$ cd ~/Documents
mruttley$ git clone https://github.com/mongodb/mongo.git

This will have created a folder in your Documents, so let’s inspect it:

mruttley$ cd mongo
mruttley$ ls
APACHE-2.0.txt   GNU-AGPL-3.0.txt SConstruct       buildscripts     distsrc          etc              rpm              src
CONTRIBUTING.rst README           build            debian           docs             jstests          site_scons

There’s quite a lot of files there, it will take a while to find where the findOne() Function is located. Github’s repository search returns far too many results so let’s use a unix file search instead. MongoDB is written in JavaScript (and compiled) so we need to look for a function definition like

findOne = function(
 or
function findOne(
 . Try this search:
mruttley$ grep -lR "findOne = function(" .
./build/opt/mongo/mongo
./build/opt/mongo/shell/mongo.cpp
./build/opt/mongo/shell/mongo.o
./src/mongo/shell/collection.js

The two flags here are

-l
  (lowercase L) which shows filenames, and
R
  which searches all subdirectories. The fullstop at the end means to search the current directory. You can see it found a javascript file there, “collection.js”. If you open it up in a text editor, the findOne() function is listed:
DBCollection.prototype.findOne = function( query , fields, options, readConcern ){
    var cursor = this.find(query, fields, -1 /* limit */, 0 /* skip*/,
        0 /* batchSize */, options);

    if ( readConcern ) {
        cursor = cursor.readConcern(readConcern);
    }

    if ( ! cursor.hasNext() )
        return null;
    var ret = cursor.next();
    if ( cursor.hasNext() ) throw Error( "findOne has more than 1 result!" );
    if ( ret.$err )
        throw _getErrorWithCode(ret, "error " + tojson(ret));
    return ret;
};

This code can also be found here on the mongodb github page: https://github.com/mongodb/mongo/blob/master/src/mongo/shell/collection.js#L207

Our function is going to extend findOne instead to find another record. This can be implemented by doing a “find” using exactly the same code, but then skipping a random number of records ahead. The skip amount has to be less than the number of records listed which unfortunately means we have to run the query twice. First to count the number of results, and second to actually skip some amount.

Copy the findOne function and rename it to findAnother, with these lines at the top instead:

DBCollection.prototype.findAnother = function( query , fields, options, readConcern ){
    // Similar to findOne but returns a different record each time

    // Have to make the query twice as we need to know the count before being able to
    // apply a random skip
    var total_records = this.find(query, fields, -1, 0, 0, options).count();

    // random number generation from: http://stackoverflow.com/a/7228322
    var randomNumber = Math.floor(Math.random()*(total_records+1)+1);
    var cursor = this.find(query, fields, -1, randomNumber, 0, options);

    if ( readConcern ) {
        cursor = cursor.readConcern(readConcern);
    }

    if ( ! cursor.hasNext() )
        return null;
    var ret = cursor.next();
    if ( cursor.hasNext() ) throw Error( "findAnother has more than 1 result!" );
    if ( ret.$err )
        throw _getErrorWithCode(ret, "error " + tojson(ret));
    return ret;
};

  1. Gets a count of the records the query returns (stored in total_records)
  2. Generates a random number in a range from 1 to the count (stored in randomNumber)
  3. Queries again using that number as a skip (stored in cursor)

Generating random numbers in a range is a little obscure in JavaScript but I found a helpful tip on StackOverflow to do it: http://stackoverflow.com/a/7228322 in one line. I’m used to Python’s ultra simple random.randrange().

Let’s test this out first. You’ll notice that all code can be returned in mongo’s javascript shell:

mruttley$ mongo 127.0.0.1
MongoDB shell version: 3.3.1-217-gdfaa843
connecting to: localhost
> 
> use my_db
switched to db my_db
> db.coll.findOne
function ( query , fields, options, readConcern ){
    var cursor = this.find(query, fields, -1 /* limit */, 0 /* skip*/,
        0 /* batchSize */, options);
...etc

You can actually replace this code live in the terminal, though it won’t be saved once you close it. Try first replacing findOne with a Hello World:

> db.coll.findOne = function(){return 'Hello World'}
function (){return 'Hello World'}
> db.coll.findOne()
Hello World

You can test out our new function first in the terminal by copying everything after the first equals. Open a mongoDB shell to your favourite db and type

db.my_collection.findOne =
  then paste in the function. Try calling it and it should return different results each time.

Let’s patch mongodb now with our function. We have to compile mongo from the source we just downloaded.

  1. Save the collection.js file you just edited with the new findAnother() function
  2. Make sure you have a C++ compiler installed. I’m using gcc, which was installed using 
    brew install gcc
  3. Check how many cores your processor has. According to Google, my 2.7 GHz Intel i5 has 4 cores.
  4. In the same terminal window (still in the mongo folder), type:
    scons -j4 mongo
      and press enter. The
    4
      here is the number of cores I have, and seriously speeds things up.
    scons
      is a program that handles compilation, and putting
    mongo
      at the end of this specifies that we only want to patch mongo’s client.

You’ll see a huge amount of green text as scons checks everything, and then:

mruttley$ scons -j4 mongo
scons: Reading SConscript files ...
scons version: 2.4.1
(blah blah...)
Checking for C header file x86intrin.h... (cached) yes
scons: done reading SConscript files.
scons: Building targets ...
Install file: "build/opt/mongo/mongo" as "mongo"
scons: done building targets.
mruttley$

Its compiled! We now have to replace the existing mongo application with our modified version. Let’s do a unix find:

mruttley$ sudo find / -name "mongo"
Password:
find: /dev/fd/3: Not a directory
find: /dev/fd/4: Not a directory
/usr/local/bin/mongo
/usr/local/Cellar/mongodb/3.2.0/bin/mongo
/usr/local/Library/Aliases/mongo
/Volumes/Data/Users/mruttley/Documents/code/mongo
/Volumes/Data/Users/mruttley/Documents/code/mongo/build/opt/mongo
/Volumes/Data/Users/mruttley/Documents/code/mongo/build/opt/mongo/mongo
/Volumes/Data/Users/mruttley/Documents/code/mongo/mongo
/Volumes/Data/Users/mruttley/Documents/code/mongo/src/mongo

We want to replace the mongo application in /usr/local/Cellar/ (where brew installed it to). Let’s back it up then copy across:

mruttley$ cp /usr/local/Cellar/mongodb/3.2.0/bin/mongo /usr/local/Cellar/mongodb/3.2.0/bin/mongo_backup
mruttley$ cp mongo /usr/local/Cellar/mongodb/3.2.0/bin/

Now, open up a MongoDB shell to your favourite DB:

mruttley$ mongo 127.0.0.1
MongoDB shell version: 3.3.1-217-gdfaa843
connecting to: localhost
> use my_db
> my_db.coll.findAnother()
{
	"_id" : ObjectId("...
etc...

 Caveats:

  • This could be really slow since it has to run the query twice. However, typically I don’t add super taxing queries to findOne().
  • This function may be replaced if you reinstall/upgrade mongo with brew – they release new versions fairly frequently.

Improvements

  • A function that acts like a python generator, keeping state and cycling forward until it reaches the end of the record set. This would fix any slowness above, but be less random.


This post first appeared on Ikigomu | A Data Science, NLP And Personal Blog By, please read the originial post: here

Share the post

Extending MongoDB findOne() functionality to findAnother()

×

Subscribe to Ikigomu | A Data Science, Nlp And Personal Blog By

Get updates delivered right to your inbox!

Thank you for your subscription

×