RethinkDB
Contents
RethinkDB¶
RethinkDB is a JSON store, real-time database, optimized for continuously updating queries when new data is available. As such, it is fantastic for monitoring purposes, interactive data, marketplaces, streaming, and web delivery. As described in their FAQ:
instead of polling for changes, the developer can tell RethinkDB to continuously push updated query results to applications in realtime
The query language is derivative of NoSQL.
Table of Contents¶
Overview¶
Languages supported
Client drivers connect to port 28015. Web UI present on port 8080.
Data storage format¶
By default, RethinkDB uses id
as the attribute for primary keys, which is auto-incremented.
RethinkDB with Docker¶
Pull the latest image from the docker hub with
docker pull rethinkdb
and start the database with a mounted volume for data, and a bound port for access:
docker run \
--name rtdb \
-p 8080:8080 \
-p 28015:28015 \
-p 29015:29015 \
-v data:/data \
-d \
rethinkdb
You can then connect to http://localhost:8080/
to access the UI and admin features.
ReQL Query Language¶
The RethinkDB Query Language is described to differ from other NoSQL languages in the sense that is embedded into a programming language. All queries are simply constructed, they are chainable and they execute on the server.
Examples¶
In the following section, I will include Python examples:
from rethinkdb import RethinkDB # import the RethinkDB package
r = RethinkDB() # create a RethinkDB object
conn = r.connect() # connect to the server on localhost and default port
Queries are constructed to be lazy, and to be compatible with parallelism for quicker execution. The laziness is to say a query such as
r.table('songs').has_fields('album').limit(5).run(conn)
will only accumulate 5 items before the query execution stops, as opposed to executing the full query and returning 5 items.
Serverside execution means the queries may also be stored for quick reuse:
# get distinct artists
distinct_lastnames_query = r.table('songs').pluck('artist').distinct()
# Send it to the server and execute
distinct_lastnames_query.run(conn)
Functional¶
Queries may also be functional, such as
r.table('songs').filter(lambda song: song['duration'] > 60).run(conn)
There are limitations to the types of functions that may be passed – for example, print
statements cannot be invoked, and if
and for
statements must be replaced with a ReQL command
r.table('songs').filter(lambda song:
r.branch(song['duration'] > 60, True, False)).run(conn)
Composable¶
ReQL queries are composable, in the sense that multiple may be combined, and JS code executed
RethinkDB supports server-side JavaScript evaluation using the embedded V8 engine (sandboxed within outside processes, of course):
For example
r.js('1 + 1').run(conn)
would evaluate the JS expression 1 + 1
server side. As such, the functional syntax can be extended to use JS functions
r.table('songs').filter(r.js('(function (song) { return song.duration > 60; })')).run(conn)
Subqueries¶
Subqueries are also supported, such as a query to select all songs from artists who are in the history
table:
r.table('songs').filter(lambda song:
r.table('history').pluck('artist').contains(song.pluck('artist'))).
run(conn)
This allows for very complex queries to be easily constructed.
Expressions¶
Expressions are also supported. Here is a query which will search the songs
table for all songs with more up-votes than down-votes, and subsequently increase the rating of the song by 1:
r.table('songs').filter(lambda song: song['upvotes'] - user['downvotes'] > 0)
.update(lambda song: {'rating': song['rating'] + 1}).run(conn)
For more, see the full Python API reference.
Regex¶
You can use regex in queries by using a .match()
function call in the lambda:
r.table('songs').filter(lambda song:
song['name'].match(r"ing$")
).run(conn)
Connection¶
For exploratory and interactive purposes, when creating a connection you can also call .repl()
to ensure the connection is kept alive, and to avoid calling .run(conn)
at the end of each query. See this section of the documentation.
Geospatial use and geoemtry¶
Adapted mainly from information in the JS API on geospatial queries.
RethinkDB has native support for handling geometric data. Geoemtry objects are points mapped onto a sphere. Distances are calculated by RethinkDB as geodesics on a sphere. The data is by default interpreted as latitude and longitude (reverse order from e.g. Google maps) points:
RethinkDB supports the WGS84 World Geodetic System’s reference ellipsoid and geographic coordinate system (GCS)
ReQL objects may be converted to GeoJSON and vice-versa.
To specify a table as containing geometric data, we apply an index
r.table('map').indexCreate('locations', {geo: true})
The example data format for such a table would then be
r.table('map').insert([
{
id: 1,
name: 'Taunton',
location: r.point(-3.1349583, 51.021485)
},
{
id: 2,
name: 'Bristol',
location: r.point(-3.2307308, 51.0237906,)
}
])
Calculating the distance between these points:
r.table('map').get(1)('location').distance(r.table('geo').get(2)('location'))
or finding the nearest location to an arbitrary point
var point = r.point(-3.651051, 51.0234467); // Stonehenge
r.table('map').getNearest(point, {index: 'location'})
Data types¶
The data types supported for geospatial queries are
points
lines
polygons (i.e. at least three points)