Wrapping C and C++ into Python
Contents
Wrapping C and C++ into Python¶
One of the many flexibilities that Python comes with is integration into many other languages. Since Python itself is compiled down into C bytecode, it is seen that Python and Clangs almost go hand-in-hand; indeed, many libraries such as numpy and scipy have primarily a C backend.
The most primitive was of wrapping Clang into Python is using setuptools
and the Python/C API. Other libraries exist to make this task less verbose and more seamless, such as Cython, but here are my notes from reading a section on package-free Clang wrapping.
Table of Contents¶
Creating the build environment¶
Fortunately, very little is required to compile the Clang, as the Python library setuptools
can take care of a lot of the heavy lifting for us. Using the directory structure
root/
setup.py
test_module.c
we can define our C extension, and the relevant source files in setup.py
, as such
import setuptools
test_module = setuptools.Extension('test_module', ['test_module.c'])
setuptools.setup(
name = 'Some ID name',
version = '1.0.0',
ext_modules = [test_module]
)
To then build / compile our code, we run
python setup.py build
and to install it into our interpreter / environment (note: would recommend highly using venvs)
python setup.py build install
We can then pop a python shell, or write a python script, and include our module as
import test_module
Writing Pythonic C code¶
The setuptools
environment links the Python.h
header into the path, which itself defines the whole Python/C API.
sum_of_squares
example¶
For a simple example, we will write a sum of squares function. A python implementation of this would be
def sum_of_squares(n):
sum = 0
for i in range(n):
sq = i * i
if sq < n:
sum += sq
else:
break
return sum
To recreate this in C requires a little overhead. We will use pointers to PyObject
for the majority of our arguments; and indeed, the general practice of writing a Python function in C is to use the definition
static PyObject* function(PyObject* self, PyObject* args, PyObject **kwargs);
Note that it isn’t necessary to provide args and kwargs in the definition if they are not going to be used in the function.
We could then write our sum of squares function as
#include <Python.h>
static PyObject* sum_of_squares(PyObject* self, PyObject* args) {
int n;
int sum = 0; // default value
// parse python arguments
if (!PyArg_ParseTuple(args, "i", &n)) { // "i" says we expect an integer
return NULL; // throw error
}
for (int sq, i = 0; (sq = i * i) < n; i++) {
sum += sq;
}
return PyLong_FromLong(sum); // return a python object
}
Making the extension accessible¶
Our function alone wont be of much use if not included into a module. To define the API of our module, a little extra work in C is required
static PyMethodDef test_methods[] = { // define the available methods
{
"sum_of_squares", // python name
sum_of_squares, // pointer to C function
METH_VARARGS, // argument types
"Sum of the perfect squares below some n." // doc string
},
{NULL, NULL, 0, NULL}, // list terminator
};
Above we have defined the functions we want to make available in our module through a nested array. We will then define a module
static struct PyModuleDef some_test_module = { // define the module
PyModuleDef_HEAD_INIT,
"test_module", // module name
NULL, // documentation
-1, // state (-1 is global); used by sub-interpreters
test_methods // method array pointer
};
which will have these methods available as member functions. Finally, we initialize the module (called upon import test_module
)
PyMODINIT_FUNC PyInit_test_module(void) {
return PyModule_Create(&some_test_module);
}
And we’re done! We can now create a little python script to try it out
import test_module
print(test_module.sum_of_squares(100))
# 285
Some API notes¶
I’ve encountered a few additional pieces of information whilst learning this API which I thought I would document here.
Using **kwargs
¶
Using keyword arguments in our C code is fairly intuitive. We can implement a function that uses keywords as
static PyObject* function(PyObject *self, PyObject *args, PyObject *kwargs) {
int some_var = 0;
int some_prop = 0;
static char* keywords[] = {"", "var", NULL}; // empty denote positional only
if(!PyArg_ParseTupleAndKeywords(
args, kwargs, "i|i", keywords, &some_prop, &some_var)) { // the | separates optional args
return NULL;
}
// use variables, e.g.
return PyLong_FromLong(some_var + (2 * some_prop));
}
We also need to change the PyMethodDef
index to use METH_VARARGS | METH_KEYWORDS
instead of just METH_VARARGS
.
The arguments, if used without a keyword, are read in from left to right; e.g.
function(2, 1) # 5 -> 2 = some_prop, 1 = some_var
function(1, 2) # 4 -> 1 = some_prop, 2 = some_var
function(1) # 2 -> 1 = some_prop, 0 = some_var i.e. default
function(1, var=2) # 4 -> 1 = some_prop, 2 = some_var
A few things to note; empty strings in keywords[]
denote only positional arguments, and the |
in the argument type specifier separates required from optional. The default values of optional arguments are the default values assigned to the variables, i.e.
int some_var = 0;
int some_prop = 0;
For full reference on the argument parsing capabilities of the Python/C API, see here.
A small technicality arises in specifying the argument types, namely the API introduces a $
, which, to quote the documentation
PyArg_ParseTupleAndKeywords()
only: Indicates that the remaining arguments in the Python argument list are keyword-only. Currently, all keyword-only arguments must also be optional arguments, so | must always be specified before $ in the format string.
So to keep up with the modern implementation details, we should have written our parser as
PyArg_ParseTupleAndKeywords(
args, kwargs, "i|$i", keywords, &some_prop, &some_var)
This helps ensure we don’t accidentally use positional arguments as optionals. Note that the parser also implicitly does type conversion and checks for overflows! It is therefore important that the specifier matches the variable type exactly.
Raising exceptions¶
The Python/C API allows for all sorts of different exceptions to be raised in the Python interpreter. This is preferred over trying to handle them in Clang, since an uncaught exception will cause the entire environment to crash, and error messages are brief, if not cryptic.
The general idiom is to define some error type, and then return NULL
. For example, a function which throws a runtime error with a custom message could be
static PyObject* throw_error(PyObject *self, PyObject, *args) {
PyErr_SetString(PyExc_RuntimeError, "Custom error text.");
return NULL;
}
Calling this function in Python results in a pleasant
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Custom error text.
There are numerous exception manipulating functions in the Python/C Api (see here); some of the more general use cases are
PyErr_Clear()
: clears the current exception state so that e.g. a new state may be defined
And common error types to be called in PyErr_SetString
are
PyExc_TypeError
PyExc_RuntimeError
The static
prefix¶
In these notes, all method definitions have been defined as static
, since they are functions defined in a single scope (i.e. a single C file). However, when extending these recipes to mutiple header and source files, the static definition must be dropped, else conflicts in the header includes will prevent the module from compiling.
Using Python objects in Clang¶
The malleability of Python can be a little difficult to translate into a strongly type language like C or C++, however the Python/C API helps to smooth out many difficulties.
Using callbacks¶
To use a callback in Clang is surprisingly straight forward. Consider a function that just calls the callback on an integer argument; the implementation is
static PyObject* act_callback(PyObject* self, PyObject* args) { // nb: no kwargs
int value = 0;
PyObject* callback = NULL;
if (!PyArg_ParseTuple(args, "O|i", &callback, &value)) {
return NULL;
}
// check is callback is okay
if (!PyCallable_Check(callback)) {
PyErr_SetString(PyExc_TypeError,
"Callback is not callable.");
return NULL;
}
value = PyLong_AsLong(PyObject_CallFunction(callback, "i", value));
Py_DECREF(callback); // reduce reference count so that C doesn't hold on to the object
return PyLong_FromLong(value);
}
We check that the callback is okay, we parse our argument into the callback, and then, since the callback is executing Python code, must cast the return object back into a C object. Since we expect only one return item, we can use the PyLong_AsLong
to facilitate this conversion (bad conversion throws a TypeError
).
Using iterators¶
A common idiom in python is to pass a list or iterator to a function. We can use these in Clang too; consider a iterator sum accumulator
static PyObject* sum_itt(PyObject* self, PyObject* args) {
int sum = 0;
PyObject* iterator;
PyObject* item;
if (!PyArg_ParseTuple(args, "O", &iterator)) {
return NULL;
}
if (!PyIter_Check(iterator)) {
iterator = PyObject_GetIter(iterator); // make iterator if not already an iterator
if (iterator == NULL) {
PyErr_SetString(PyExc_TypeError,
"Argument is not iterable!");
return NULL;
}
}
while ((item = PyIter_Next(iterator))) {
sum += PyLong_AsLong(item);
Py_DECREF(item);
}
Py_DECREF(iterator);
return PyLong_FromLong(sum);
}
Here we check if the object is already iterable, else try to create an iterator from it. We can then cycle calls to PyIter_Next()
, which by extension is just calling next()
in python, to cycle through the iterator until depleted. We have to dereference each item after we assign it, as to allow the GC in python to clean up properly.
The above may be used
sum_itt([1, 2, 3, 4, 5]) # 15
sum_itt(iter([1, 2, 3, 4, 5])) # 15