Given a new MongoDB instance to work with, its often difficult to understand the structure since, unlike SQL, the database is inherently schema-less. A popular initial tool is Variety.js, which finds all possible fields and tells you the types, occurrences and percentages of them in the db.
However, it is often useful to see the contents of the fields as well. For this, you can use findOne(), which grabs what seems to be the last inserted record from the collection and pretty-prints it to the terminal. This is great, but sometimes you want to see more than one record to get a better feel for field contents. In this post I’ll show you how I extended Mongo to do this.
(Assuming you’ve already installed mongo and git – I use
brew install mongofrom the homebrew package manager and git came with XCode on my Mac)
First off, let’s get a copy of mongo’s code:
mruttley$ cd ~/Documents mruttley$ git clone https://github.com/mongodb/mongo.git
This will have created a folder in your Documents, so let’s inspect it:
mruttley$ cd mongo mruttley$ ls APACHE-2.0.txt GNU-AGPL-3.0.txt SConstruct buildscripts distsrc etc rpm src CONTRIBUTING.rst README build debian docs jstests site_scons
There’s quite a lot of files there, it will take a while to find where the findOne() Function is located. Github’s repository search returns far too many results so let’s use a unix file search instead. MongoDB is written in JavaScript (and compiled) so we need to look for a function definition like
findOne = function(or
function findOne(. Try this search:
mruttley$ grep -lR "findOne = function(" . ./build/opt/mongo/mongo ./build/opt/mongo/shell/mongo.cpp ./build/opt/mongo/shell/mongo.o ./src/mongo/shell/collection.js
The two flags here are
-l(lowercase L) which shows filenames, and
Rwhich searches all subdirectories. The fullstop at the end means to search the current directory. You can see it found a javascript file there, “collection.js”. If you open it up in a text editor, the findOne() function is listed:
DBCollection.prototype.findOne = function( query , fields, options, readConcern ){ var cursor = this.find(query, fields, -1 /* limit */, 0 /* skip*/, 0 /* batchSize */, options); if ( readConcern ) { cursor = cursor.readConcern(readConcern); } if ( ! cursor.hasNext() ) return null; var ret = cursor.next(); if ( cursor.hasNext() ) throw Error( "findOne has more than 1 result!" ); if ( ret.$err ) throw _getErrorWithCode(ret, "error " + tojson(ret)); return ret; };
This code can also be found here on the mongodb github page: https://github.com/mongodb/mongo/blob/master/src/mongo/shell/collection.js#L207
Our function is going to extend findOne instead to find another record. This can be implemented by doing a “find” using exactly the same code, but then skipping a random number of records ahead. The skip amount has to be less than the number of records listed which unfortunately means we have to run the query twice. First to count the number of results, and second to actually skip some amount.
Copy the findOne function and rename it to findAnother, with these lines at the top instead:
DBCollection.prototype.findAnother = function( query , fields, options, readConcern ){ // Similar to findOne but returns a different record each time // Have to make the query twice as we need to know the count before being able to // apply a random skip var total_records = this.find(query, fields, -1, 0, 0, options).count(); // random number generation from: http://stackoverflow.com/a/7228322 var randomNumber = Math.floor(Math.random()*(total_records+1)+1); var cursor = this.find(query, fields, -1, randomNumber, 0, options); if ( readConcern ) { cursor = cursor.readConcern(readConcern); } if ( ! cursor.hasNext() ) return null; var ret = cursor.next(); if ( cursor.hasNext() ) throw Error( "findAnother has more than 1 result!" ); if ( ret.$err ) throw _getErrorWithCode(ret, "error " + tojson(ret)); return ret; };
- Gets a count of the records the query returns (stored in total_records)
- Generates a random number in a range from 1 to the count (stored in randomNumber)
- Queries again using that number as a skip (stored in cursor)
Generating random numbers in a range is a little obscure in JavaScript but I found a helpful tip on StackOverflow to do it: http://stackoverflow.com/a/7228322 in one line. I’m used to Python’s ultra simple random.randrange().
Let’s test this out first. You’ll notice that all code can be returned in mongo’s javascript shell:
mruttley$ mongo 127.0.0.1 MongoDB shell version: 3.3.1-217-gdfaa843 connecting to: localhost > > use my_db switched to db my_db > db.coll.findOne function ( query , fields, options, readConcern ){ var cursor = this.find(query, fields, -1 /* limit */, 0 /* skip*/, 0 /* batchSize */, options); ...etc
You can actually replace this code live in the terminal, though it won’t be saved once you close it. Try first replacing findOne with a Hello World:
> db.coll.findOne = function(){return 'Hello World'} function (){return 'Hello World'} > db.coll.findOne() Hello World
You can test out our new function first in the terminal by copying everything after the first equals. Open a mongoDB shell to your favourite db and type
db.my_collection.findOne =then paste in the function. Try calling it and it should return different results each time.
Let’s patch mongodb now with our function. We have to compile mongo from the source we just downloaded.
- Save the collection.js file you just edited with the new findAnother() function
- Make sure you have a C++ compiler installed. I’m using gcc, which was installed using
brew install gcc
- Check how many cores your processor has. According to Google, my 2.7 GHz Intel i5 has 4 cores.
- In the same terminal window (still in the mongo folder), type:
scons -j4 mongo
and press enter. The4
here is the number of cores I have, and seriously speeds things up.scons
is a program that handles compilation, and puttingmongo
at the end of this specifies that we only want to patch mongo’s client.
You’ll see a huge amount of green text as scons checks everything, and then:
mruttley$ scons -j4 mongo scons: Reading SConscript files ... scons version: 2.4.1 (blah blah...) Checking for C header file x86intrin.h... (cached) yes scons: done reading SConscript files. scons: Building targets ... Install file: "build/opt/mongo/mongo" as "mongo" scons: done building targets. mruttley$
Its compiled! We now have to replace the existing mongo application with our modified version. Let’s do a unix find:
mruttley$ sudo find / -name "mongo" Password: find: /dev/fd/3: Not a directory find: /dev/fd/4: Not a directory /usr/local/bin/mongo /usr/local/Cellar/mongodb/3.2.0/bin/mongo /usr/local/Library/Aliases/mongo /Volumes/Data/Users/mruttley/Documents/code/mongo /Volumes/Data/Users/mruttley/Documents/code/mongo/build/opt/mongo /Volumes/Data/Users/mruttley/Documents/code/mongo/build/opt/mongo/mongo /Volumes/Data/Users/mruttley/Documents/code/mongo/mongo /Volumes/Data/Users/mruttley/Documents/code/mongo/src/mongo
We want to replace the mongo application in /usr/local/Cellar/ (where brew installed it to). Let’s back it up then copy across:
mruttley$ cp /usr/local/Cellar/mongodb/3.2.0/bin/mongo /usr/local/Cellar/mongodb/3.2.0/bin/mongo_backup mruttley$ cp mongo /usr/local/Cellar/mongodb/3.2.0/bin/
Now, open up a MongoDB shell to your favourite DB:
mruttley$ mongo 127.0.0.1 MongoDB shell version: 3.3.1-217-gdfaa843 connecting to: localhost > use my_db > my_db.coll.findAnother() { "_id" : ObjectId("... etc...
Caveats:
- This could be really slow since it has to run the query twice. However, typically I don’t add super taxing queries to findOne().
- This function may be replaced if you reinstall/upgrade mongo with brew – they release new versions fairly frequently.
Improvements
- A function that acts like a python generator, keeping state and cycling forward until it reaches the end of the record set. This would fix any slowness above, but be less random.
This post first appeared on Ikigomu | A Data Science, NLP And Personal Blog By, please read the originial post: here