Click here to Skip to main content
15,881,898 members
Articles / Programming Languages / C++
Article

Custom Python Part 1: Extensions

Rate me:
Please Sign up or sign in to vote.
3.82/5 (15 votes)
21 Nov 2002CC (ASA 2.5)8 min read 85.7K   955   35   5
How to build C/C++ custom extension libraries for Python.

Introduction

This is the first part of a two-part series on Python. This article will cover the basic techniques for extending Python using C/C++. The second article will cover the techniques for embedding the Python interpreter in your C/C++ application. While I don't often use Python extensions directly, the information covered here is an absolute requirement for embedded use.

This article is not meant to be an introduction to Python. It assumes a working knowledge of the language.

You will need a Python distribution to build and use the sample. There are two major distributors of Python: Python.org and ActiveState.com. Either one will suffice, but my preference these days is the ActiveState one. They've compiled the help files into a Windows HTML help file that I find easier to navigate, than the basic distribution. Plus it comes with all of the Windows extension libraries.

They both come with include and libs directories, so you will not need to download the source code. The source will be necessary, however, in the second part of this series.

Extensions - What are they?

Python's motto is "Batteries included". By this they mean that, out of the package, Python does a lot. It comes with many extra modules that give the user, access to such features as sockets, CGI, URL parsing and HTTP support, XML processing, MIME, threads, and even XML-based RPC. Despite all of these extras, we will always need/want to customize.

Extension modules come in two forms: native Python, and C/C++ dynamically linked libraries (DLLs). Native Python modules are simply Python scripts that are available to be imported by user scripts. Creating native Python modules is as simple as writing a Python script.

C/C++ extension modules are compiled DLLs with a standard exported function that handles module initialization and registration. This article covers these DLL-based extension modules.

The API

Python is written in C, and the authors have been kind enough to expose and document most of the interpreter internals. It is through this API that we gain the access we need, to extend the language.

Python objects

Every object, nay every value, in Python is represented internally as a PyObject. PyObject is a structure that defines all of the handler entry points, and maintains type information and reference counts. One of the fundamentals of Python extension programming is that, whenever you manipulate any Python object in C/C++, you will be manipulating a PyObject.

That being said, you will rarely use the PyObject API routines. Instead, you will use the API routines that apply to the specific Python type being manipulated. Please see the Python C/C++ API documentation for specifics.

Reference counting

Python handles basic memory management with a reference counting mechanism. Each object has a reference count that gets incremented when the object is copied, and decremented when the object reference is dropped.

class doodle:
    # etc.

d = doodle()    # reference count = 1
e = d           # reference count = 2
del(d)          # reference count = 1
del(e)          # reference count = 0 object deleted

When manipulating Python objects in C/C++, one must be conscious of the reference counts. Certain functions (marked in the Python C API documents) return new objects; others return borrowed references. Use the Py_INCREF() and Py_DECREF() macros to assist you. Also, I suggest that you mark each questionable API call with a comment noting the return type.

PyObject *pList = PyList_New(5); // new reference
...
PyObject *pItem = PyList_GetItem(2); // borrowed reference
...
Py_DECREF(pList);

Reference counts are important! I have spent many hours trying to track down memory leaks only to find that I didn't Py_DECREF() an object when I should have.

Python types

There are six major native data types in Python: integers, floats, strings, tuples, lists and dictionaries. Python supports a variety of other native types (complex, long integers, etc), the use of which will be left as the proverbial exercise for the reader.

Integers, floats and strings

These are just what you would expect. The only thing you need to know is how to build and manipulate them.

// build an integer
PyObject *pInt = Py_BuildValue("i", 147); // new reference
assert(PyInt_Check(pInt));

int i = PyInt_AsLong(pInt);
Py_DECREF(pInt);

// build a float
PyObject *pFloat =
        Py_BuildValue("f", 3.14159f); // new reference
assert(PyFloat_Check(pFloat));

float f = PyFloat_AsDouble(pFloat);
Py_DECREF(pFloat);

// build a string
PyObject *pString =
        Py_BuildValue("s", "yabbadabbadoo"); // new reference
assert(PyString_Check(pString);

int nLen = PyString_Size(pString);
char *s = PyString_AsString(pString);
Py_DECREF(pString);

Tuples

Tuples are fixed-length immutable arrays. When a Python script calls a C/C++ extension method, all non-keyword arguments are passed in a tuple. Needless to say, parsing this tuple tends to be the first thing done in your methods.

Here is a mish-mash of tuple use:

// create the tuple
PyObject *pTuple = PyTuple_New(3); // new reference
assert(PyTuple_Check(pTuple));
assert(PyTuple_Size(pTuple) == 3);

// set the item
PyTuple_SetItem(pTuple, 0, Py_BuildValue("i", 1));
PyTuple_SetItem(pTuple, 1, Py_BuildValue("f", 2.0f));
PyTuple_SetItem(pTuple, 2, Py_BuildValue("s", "three"));

// parse tuple items
int i;
float f;
char *s;
if(!PyArg_ParseTuple(pTuple, "ifs", &i, &f, &s))
    PyErr_SetString(PyExc_TypeError, "invalid parameter");

// cleanup
Py_DECREF(pTuple);

PyArg_ParseTuple() is probably one of the most commonly used API functions. The second parameter is a string that dictates the types of objects expected in the tuple. ifs means: integer, float, string. Please see the API documentation for a detailed explanation, and a list of the other type characters.

Lists

Lists are like STL vectors. They allow random access and iteration over stored objects. Here is an example of typical list use:

// create the list
PyObject *pList = PyList_New(5); // new reference
assert(PyList_Check(pList));

// set some initial values
for(int i = 0; i < 5; ++i)
    PyList_SetItem(pList, i, Py_BuildValue("i", i));

// insert an item
PyList_Insert(pList, 3, Py_BuildValue("s", "inserted"));

// append an item
PyList_Append(pList, Py_BuildValue("s", "appended"));

// sort the list
PyList_Sort(pList);

// reverse the list
PyList_Reverse(pList);

// fetch and manipulate a list slice
PyObject *pSlice =
        PyList_GetSlice(pList, 2, 4); // new reference
for(int j = 0; j < PyList_Size(pSlice); ++j) {
    PyObject *pValue = PyList_GetItem(pList, j);
    assert(pValue);
}
Py_DECREF(pSlice);

// cleanup
Py_DECREF(pList);

Dictionaries

Dictionaries are the equivalent of STL maps. They map keys to values. Here is an example of typical dictionary use:

// create the dictionary
PyObject *pDict = PyDict_New(); // new reference
assert(PyDict_Check(pDict));

// add a few named values
PyDict_SetItemString(pDict, "first",
        Py_BuildValue("i", 1));
PyDict_SetItemString(pDict, "second",
        Py_BuildValue("f", 2.0f));

// enumerate all named values
PyObject *pKeys = PyDict_Keys(); // new reference
for(int i = 0; i < PyList_Size(pKeys); ++i) {
    PyObject *pKey =
            PyList_GetItem(pKeys, i); // borrowed reference
    PyObject *pValue =
            PyDict_GetItem(pDict, pKey); // borrowed reference
    assert(pValue);
}
Py_DECREF(pKeys);

// remove a named value
PyDict_DelItemString(pDict, "second");

// cleanup
Py_DECREF(pDict);

Extension concepts

An extension module typically consists of three parts: the actual exported functions, the method table and the initialization function.

First, we will look at a simple example of an extension module. Then we will examine each part individually.

A typical extension module will look like this:

// exported yinkle() function
static PyObject *wanklib_yinkle(PyObject *pSelf,
                                PyObject *pArgs)
{

    char *szString;
    int nInt;
    float fFloat;
    PyObject *pList;

    if(!PyArg_ParseTuple(pArgs, "sifo", &szString, &nInt,
                         &fFloat, &pList))
    {
        PyErr_SetString(PyExc_TypeError,
                "yinkle() invalid parameter");
        return NULL;
    }

    if(!PyList_Check(pList)) {
        PyErr_SetString(PyExc_TypeError,
                "yinkle() fourth parameter must be a list");
        return NULL;
    }

    PyList_Append(pList,
            Py_BuildObject("f",
                    strlen(szString) * nInt / fFloat));

    Py_INCREF(Py_None);
    return Py_None;
}

// wanklib method table
static PyMethodDef WankLibMethods[] = {
    {"yinkle", wanklib_yinkle,
        METH_VARARGS, "Do a bit of stuff."},
    {NULL, NULL, 0, NULL}
};

// wanklib initialization function
void initwanklib(void) {

    // initialize module
    PyObject *pModule =
            Py_InitModule("wanklib", WankLibMethods);

    // fetch module dictionary and add non-function symbols
    PyObject *pDict = PyModule_GetDict(pModule);
    PyDict_SetItemString(pDict,
            "eleven", Py_BuildValue("i", 147));
    PyDict_SetItemString(pDict,
            "doubleyew", Py_BuildValue("s", "kay"));
}

Initialization function

Every extension module must export a function called initmodule. When a Python script requests that the module be imported, Python queries the library for that exact named function and calls it. The initialization function is responsible for telling Python about the functions, variables and classes that it exports.

Method table

The initialization function will call the Python routine Py_InitModule() to register the module methods. It will pass the name by which the new module will be known, and a table describing the exported methods. Each table entry consists of four parts: the callable name string, the function itself, a parameter describing how parameters will be passed, and a documentation string. The last entry in the table needs to be a sentinel with NULL entries.

The name string is the name by which the method will be callable from Python. The parameter type marker can be METH_VARARGS or METH_KEYWORDS. METH_VARARGS is the standard way of passing parameters; they arrive packaged in a tuple. Specifying METH_KEYWORDS requests that, named parameters be passed in a dictionary.

Methods

All extension methods have the same prototype (given that they are marked as METH_VARARGS):

PyObject *method(PyObject *pSelf, PyObject *pArgs);

All extension methods must return a PyObject pointer. If the function has no real return value, you must return a pointer to the global "None" object, after incrementing its reference:

PyObject *method(PyObject *pSelf, PyObject *pArgs) {

    Py_INCREF(Py_None);
    return Py_None;
}

To signify that an error has occurred and to throw a Python exception, you must return NULL and set the error string:

PyObject *method(PyObject *pSelf, PyObject *pArgs) {

    PyErr_SetString(PyExc_StandardError,
            "something bad happened");
    return NULL;
}

The first argument to extension methods is a "self" pointer and is really only valid when you are building custom classes. These will be detailed in the next article.

The second argument is a tuple containing each parameter in order. As mentioned above, parsing this tuple is usually the first thing that happens.

Variables

Each Python module has a dictionary of local objects. In order to export variables from your module, all you need to do is add them to this dictionary. Py_InitModule() returns a pointer to the initialized module. PyModule_GetDict() retrieves the local object dictionary.

PyObject *pModule =
        Py_InitModule("wanklib", wankLibMethods);
PyObject *pDict = PyModule_GetDict(pModule);
PyDict_SetItemString(pDict, "someVar",
        Py_BuildValue("i", 147));

Implementation

In Windows, Python extensions are simply DLL files with a known exported symbol (the initialization function). In order to build an extension, you must create a Win32 dynamic linked library project in Visual Studio. Choose A DLL that exports some symbols, so you have a bit of a template from which to work. I'm sure you could build an extension using the MFC AppWizard but I've never tried it and don't intend to.

Simple extensions can be built in a single file, and will follow the layout shown above in the example.

All Python API declarations are accessed by including one file: Python.h. It is located in the include subdirectory below your Python installation. Rather than hard coding the path, it's much more desirable to add the directory to your Tools/Options/Directories list. While you're at it, add the libs subdirectory to the list of Library Files search paths as well.

There is no need to explicitly link the Python library. The Python.h include file uses a pragma to force the proper linkage.

NOTE: The pragma in Python.h forces the linkage of the debug build of Python: Python22_d.lib (version may be different depending on the version you have installed). If you haven't downloaded the Python source code, you likely don't have this library. Your choices are to download and build the debug versions, or to build your extensions in release mode.

In order to remove the C++ name mangling, you need to define your initialization function as extern "C".

And lastly, once compiled, place your DLL file in the DLLs subdirectory off of your Python installation. It will get picked up automatically when you try to import it.

Beyond that, you should be ready to go.

Example

The example that I have included simply wraps the Mersenne Twister pseudo random number generator. I have used what appears to be original code by the "inventors", Makoto Matsumoto and Takuji Nishimura. The mtprng module provides two methods: sgenrand() to seed the generator, and genrand() to generate a number on [0,1]. The code compiles with VS6 and SP5, with Python 2.2 installed, although any Python version beyond 1.5.3 should be fine.

Good luck!

History

  • November 21 2002 - Created.

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-ShareAlike 2.5 License


Written By
Web Developer
Canada Canada
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralEmbedded Python Tutorial Pin
Anonymous2-May-03 23:10
Anonymous2-May-03 23:10 
GeneralBoost Python Pin
Jonathan de Halleux16-Dec-02 18:03
Jonathan de Halleux16-Dec-02 18:03 
GeneralRe: Boost Python Pin
Jamie Hale17-Dec-02 5:05
Jamie Hale17-Dec-02 5:05 
GeneralLua Pin
Jonathan de Halleux23-Dec-02 5:21
Jonathan de Halleux23-Dec-02 5:21 
GeneralRe: Lua Pin
Jamie Hale23-Dec-02 5:23
Jamie Hale23-Dec-02 5:23 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.