Using MongoDB and Python

MongoDB provides a Python driver for effectively interacting with the Mongo Daemons. It can be installed easily with

pip install pymongo

Mongo storage paradigm

Mongo stores everything as collections of documents, translated as JSON format (specifically BSON, the binary translation of JSON). MongoDB is able to perform all of the usual CRUD operations.

Issuing commands to the server

A simple script for executing the serverStatus command using an instance of the MongoClient

from pymongo import MongoClient

client = MonoClient("mongodb-url", port=27017)
db = client.admin
serverStatusResult = db.command("serverStatus")

Generating sample data

Here’s a small script for generating test data for a music database, provided you have an established db

import random
adj = ["Infernal", "Bloody", "Freak", "Yellow", "Catastrophic", "Sad", "Happy", "Black", "Warm", "Cold"]
noun = ["Moon", "Child", "Graves", "Shake", "Corpse", "Cannibal", "Pig", "Green", "You", "God", "Satan"]
verb = ["Killing", "Saving", "Hating", "Loving", "Marry", "Creating", "Lifting"]

for i in range(500):
	song = {
		"artist": random.choice(adj) + " " + random.choice(noun),
		"song": random.choice(verb) + " " + random.choice(adj) + " " + random.choice(noun),
		"duration": random.randint(0, 360)
	}

	res = db.songs.insert_one(song)
	print("Created {} of 500 as {}".format(i+1, res.inserted_id))

which will print

# ...
Created 497 of 500 as 5e9d6d641f08210aec1a8790
# ...

CRUD Operations

Here are described the different CRUD operations in the Python driver.

Create

Entries into a database can be created using

db.insert_one({ ... })
db.insert_many([{ ... }, ... ])

These will return a result type, the documentation for which can be found here.

Read

Mongo supports database queries in an intuitive way

res = db.some_collection.find_one({"query":"value"})
ress = db.some_collection.find({"query":"value"})

num = ress.count()

The result type ress is an instance of the Cursor class. Cursors are iterable and support next() calls (throwing StopIteration when exhausted). The objects returned are analogous to python dictionaries.

Update

Updating data in the database is also very straight forward, using methods like update_one, update_many and replace_one, based off of the matches the methods find. We could perform a random update such as

song = db.songs.find_one({})	# match all
print(song)

result = db.songs.update_one({'_id' : song.get('_id')}, {'$inc': {'duration':1}})

print("Number of items modified is {}".format(result.modified_count))

print(db.songs.find_one({'_id' : song.get('_id')}))

Delete

Deleting documents is as easy as the other operations, following the same delete_one and delete_many syntax as the other methods. For example

result = db.songs.delete_many({'duration':300})
print(result.deleted_count)

A note on mongo predicated

Regular expressions can be inserted into most matching parameters as a predicate

res = db.songs.find({
		"artist":
			{"$regex": u"God", "$options":"-i"}	# -i ignores case sensitivity
	})

Note that the IDs are by default BSON, thus if you want to query them by string, you must use

from bson.objectid import ObjectId

_id = ObjectId(stringid)

Aggregation pipelines

MongoDB allows multiple database requests to be amalgamated into one; consider our test song database, where we want to know the number of count for each duration. This could be either 360 individual database requests, or, more conveniently, a single aggregated pipeline

durationgroup = db.songs.aggregate([
	# define group data
	{ '$group':
		{
			'_id':'$duration',
			'count': {'$sum' :1 }
		}
	},
	# then sort data
	{ '$sort':
		{'_id':1}
	}
])

for group in durationgroup:
	print(group)

Settings

TODO

Authentication

TODO