6 Types and Typemaps

Caution: This chapter is under repair!

Introduction

In Chapter 3, SWIG's treatment of basic datatypes and pointers was described. In particular, primitive types such as int and double are mapped to corresponding types in the target language. For everything else, pointers are used to refer to structures, classes, arrays, and other user-defined datatypes. However, in certain applications it is desirable to change SWIG's handling of a specific datatype. For example, you may want a char ** to act like a list of strings instead of a bare pointer. In another case, you may want to tell SWIG that a parameter of double *result is the output value of a function. Similarly, you might want to map a datatype of float[4] into a 4 element tuple. This chapter describes some of the techniques that can be used to customize SWIG's type handling. In particular, details of the underlying type system and typemaps, an advanced customization feature, are presented.

The Problem

Suppose that you wanted to provide a scripting language wrapper around a function with the following prototype:
int foo(int argc, char *argv[]);
If you do nothing at all, SWIG produces a wrapper that expects to receive a pointer of type char ** as the second argument. For example, if you try to use the function you might get an error like this:
>>> foo(3,["ale","lager","stout"])
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: Type error. Expected _p_p_char
>>> 
One way to fix this problem is to write a few assist functions to manufacture an object of the appropriate type. For example:
%inline %{
char **new_args(int maxarg) {
   return (char **) malloc(maxarg*sizeof(char *));
}
void del_args(char **args, int narg) {
   while (--narg > 0) {
      free(args[narg]);
   }
   free(args);
}
void set_arg(char **args, int n, char *value) {
   args[n] = (char *) malloc(strlen(value)+1);
   strcpy(args[n],value);
}
%}
Now in the scripting language:
>>> args = new_args(3)
>>> args
_000f4248_p_p_char
>>> set_arg(args,0,"ale")
>>> set_arg(args,1,"lager")
>>> set_arg(args,2,"stout")
>>> foo(3,args)
>>> del_args(args,3)
Needless to say, even though this works, it isn't the most user friendly interface. It would be much nicer if you could simply make a list of strings work like a char **. Similar problems arise when creating wrappers for small arrays, output values, and certain kinds of data structures.

One of the reasons why SWIG does not provide automatic support for mapping scripting language objects such as lists and associative arrays into C is that it is often difficult to know how and when this should be done. For example, if you have a function like this,

void foo(double *x, double *y, double *r);
it's not at all clear what the arguments are supposed to represent. Are they single values? Are they arrays? Is a result stored in one of the arguments? The only thing that SWIG really knows is that they are pointers. Any further interpretation requires a little more information.

Typemaps

One way to provide more information about a particular C datatype is to attach a special code generation rule to the type known as a typemap. Typemaps are the primary customization mechanism used to modify SWIG's default type-handling behavior. For example, suppose you have a C function like this :

void add(double a, double b, double *result) {
	*result = a + b;
}

From reading the code, it is clear that the function is returning a value in the result parameter. However, SWIG does not read the underlying source code so it has no way to know that result is somehow different than any other pointer. However, by using a typemap, you can transform double *result into an output value and generate wrapper code that handles it as such.

The most gentle introduction to typemaps is to start with some of the predefined typemaps contained in the file typemaps.i (contained in the SWIG library). For example:

// Simple example using typemaps
%module example
%include "typemaps.i"

%apply double *OUTPUT { double *result; }
extern void add(double a, double b, double *result);
The %apply directive tells SWIG that you are going to apply a set of typemap rules to a new datatype. The "double *OUTPUT" is the name of a rule describing how to return an output value from a "double *" (this rule is defined in the file typemaps.i). The rule gets applied to all of the datatypes listed in curly braces-- in this case "double *result".

When the resulting module is created, you can now use the function like this (shown for Python):

>>> a = add(3,4)
>>> print a
7
>>>
In this case, the output value normally stored in the third argument has magically been transformed into a function return value. Clearly this makes the function much easier to use since it is no longer necessary to manufacture a special double * object and pass it to the function somehow.

Such transformations can even be extended to multiple return values. For example, consider this code:

%include "typemaps.i"
%apply int *OUTPUT { int *width, int *height };

// Returns a pair (width,height)
void getwinsize(int winid, int *width, int *height);
In this case, the function returns multiple values, allowing it to be used like this:
>>> w,h = genwinsize(wid)
>>> print w
400
>>> print h
300
>>>

Once defined, a typemap rule applies to all future occurrences of a matching type and name. For instance, in the earlier example, all occurrences of double *result will be handled as an output parameter. Thus, if an interface file uses a consistent naming scheme, a typemap can be used to consistently apply special handling rules across an entire set of declarations.

It should also be noted that although the %apply directive is used to associate typemap rules to certain datatypes, you can also use the rule name directly in arguments. For example, you could write this:

// Simple example using typemaps
%module example
%include "typemaps.i"

extern void add(double a, double b, double *OUTPUT);
Finally, if it is ever necessary to clear a typemap rule, the %clear directive should be used. For example
%clear double *result;      // Remove all typemaps for double *result

Managing input and output parameters

One of the most common applications of typemaps is to handle pointers that correspond to simple input, output, or mutable function parameters. Typically this problem arises when working with functions that return more than one value such as a function that returns both a result and a status code to indicate success. The typemaps.i file contains a variety of rules for managing such pointers to the primitive C datatypes.

Input Methods

The following methods instruct SWIG that a pointer really only holds a single input value:

int *INPUT		
short *INPUT
long *INPUT
unsigned int *INPUT
unsigned short *INPUT
unsigned long *INPUT
double *INPUT
float *INPUT
When used, it allows values to be passed instead of pointers. For example, consider this function:
double add(double *a, double *b) {
	return *a+*b;
}
Now, consider this SWIG interface:

%module example
%include "typemaps.i"
...
extern double add(double *INPUT, double *INPUT);

When the function is used in the scripting language interpreter, it will work like this:

result = add(3,4)

Output Methods

The following typemap rules tell SWIG that pointer is the output value of a function. When used, you do not need to supply the argument when calling the function. Instead, one or more output values will be returned.

int *OUTPUT
short *OUTPUT
long *OUTPUT
unsigned int *OUTPUT
unsigned short *OUTPUT
unsigned long *OUTPUT
double *OUTPUT
float *OUTPUT

These methods can be used as shown in an earlier example. For example, if you have this C function :

void add(double a, double b, double *c) {
	*c = a+b;
}

A SWIG interface file might look like this :

%module example
%include "typemaps.i"
...
extern void add(double a, double b, double *OUTPUT);

In this case, only a single output value is returned, but this is not a restriction. An arbitrary number of output values can be returned by applying the output rules to more than one argument (as shown previously).

Input/Output Methods

When a pointer serves as both an input and output value you can use the following methods :

int *BOTH
short *BOTH
long *BOTH
unsigned int *BOTH
unsigned short *BOTH
unsigned long *BOTH
double *BOTH
float *BOTH

A C function that uses this might be something like this:

void negate(double *x) {
	*x = -(*x);
}

To make x function as both and input and output value, declare the function like this in an interface file :

%module example
%include typemaps.i
...
extern void negate(double *BOTH);

Now within a script, you can simply call the function normally :

$a = negate(3); 				# a = -3 after calling this
Compatibility note : The name of the BOTH rule will probably be changed to a more descriptive name in a future release.

Using different names

As previously shown, the %apply directive can be used to apply the INPUT, OUTPUT, and BOTH rules to different argument names. For example:

// Make double *result an output value
%apply double *OUTPUT { double *result };

// Make Int32 *in an input value
%apply int *INPUT { Int32 *in };

// Make long *x both
%apply long *BOTH {long *x};

To clear a rule, the %clear directive is used:

%clear double *result;
%clear Int32 *in, long *x;

Applying constraints to input values

In addition to changing the handling of various input values, it is also possible to use typemaps to apply constraints. For example, maybe you want to insure that a value is positive, or that a pointer is non-NULL. This can be accomplished including the constraints.i library file.

Simple constraint example

The constraints library is best illustrated by the following interface file :

// Interface file with constraints
%module example
%include "constraints.i"

double exp(double x);
double log(double POSITIVE);         // Allow only positive values
double sqrt(double NONNEGATIVE);     // Non-negative values only
double inv(double NONZERO);          // Non-zero values
void   free(void *NONNULL);          // Non-NULL pointers only

The behavior of this file is exactly as you would expect. If any of the arguments violate the constraint condition, a scripting language exception will be raised. As a result, it is possible to catch bad values, prevent mysterious program crashes and so on.

Constraint methods

The following constraints are currently available

POSITIVE                     Any number > 0 (not zero)
NEGATIVE                     Any number < 0 (not zero)
NONNEGATIVE                  Any number >= 0
NONPOSITIVE                  Any number <= 0
NONZERO                      Nonzero number
NONNULL                      Non-NULL pointer (pointers only).

Applying constraints to new datatypes

The constraints library only supports the built-in C datatypes, but it is easy to apply it to new datatypes using %apply. For example :

// Apply a constraint to a Real variable
%apply Number POSITIVE { Real in };

// Apply a constraint to a pointer type
%apply Pointer NONNULL { Vector * };

The special types of "Number" and "Pointer" can be applied to any numeric and pointer variable type respectively. To later remove a constraint, the %clear directive can be used :

%clear Real in;
%clear Vector *;

Writing new typemaps

So far, only a few examples of using typemaps have been presented. However, if you're willing to get your hands dirty and dig into the internals of your favorite scripting language (and SWIG), it is possible to do a lot more.

Before diving in, it needs to be stressed that under normal conditions, SWIG does NOT require users to write typemaps (and even when they are used, it is probably better to use them sparingly). A common confusion among some new users to SWIG is that they somehow need to write typemaps to handle new typenames when in fact they really only need to use a typedef declaration. For example, if you have a declaration like this,

void blah(size_t len);
you really only need to supply an appropriate typedef to make it work. For example:
typedef unsigned long size_t;
void blah(size_t len);
Typemaps are only used if you want to change the way that SWIG actually generates its wrapper code (e.g., if you wanted to express size_t as a string of roman numerals or something). A more practical application would be convert common scripting language objects such as lists and associative arrays into C datatypes. For example, converting a list of strings into a char *[] as shown in the first part of this chapter.

Before proceding, you should first ask yourself if it is really necessary to change SWIG's default behavior. Next, you need to be aware that writing a typemap from scratch usually requires a detailed knowledge of the internal C API of the target language. Finally, it should also be stressed that by writing typemaps, it is easy to break all of the output code generated by SWIG. With these risks in mind, this section describes the basics of the SWIG type system and typemap construction. Language specific information (which is often quite technical) is contained in later chapters.

The SWIG type system

Typemaps are tightly integrated with the internal operation of the SWIG type system. Internal to SWIG, all C++ datatypes are managed as a pair of types (type, ltype). type is a representation of the actual C++ datatype as it was specified in the interface file. ltype is a modified version of the datatype that can be used as an assignable local variable (a type that can be used on the left-hand side of a C assignment operator). The relationship between these two types pertains to how wrapper code is actually generated. Specifically, ltype is used to declare local variables used during argument conversion whereas type is used to make sure the actual C/C++ function is called without any type-errors. For example, if you have a C declaration like this:
void func(..., type, ...);
The corresponding wrapper code will look approximately like this:
wrap_func(args) {
   ...
   ltype argn;
   ...
   argn = ConvertValue(args[n]);
   ...
   func(..., (type) argn, ...);
   ...
}
The relationship between the real C++ datatype and its ltype value is determined by the following rules: For example:
type                    ltype
----------------------  --------------------
object                  object
object *                object *
const object *          object *
const object * const    object *
object &                object *
object [10]             object *
object [10][20]         object (*)[20]
In certain cases, names defined with typedef are also expanded. For example, if you have a type defined by a typedef as follows:
typedef double Matrix[4][4];
the ltype of Matrix is set to double (*)[4].

It should probably be stressed that these rules define the precise behavior of the SWIG run-time type checker. Specifically, all of the type checking described in Chapter 3 is actually performed using ltype values and not the actual C datatype. This explains why, for instance, there is no difference between pointers, references, and one-dimensional arrays when they are used in the corresponding scripting language module. It also explains why qualifiers never appear in the mangled type-names used for type checking.

What is a typemap?

A typemap is a code generation rule that is attached to a datatype that appears in the interface file. It is specified using the %typemap directive. For example, a simple typemap might look like this:

%module example
%typemap(python,in) int {
   $target = (int) PyInt_AsLong($source);
   if (PyErr_Occurred()) return NULL;
   printf("received %d\n", $target);
}

int add(int a, int b);

In this case, the typemap is defining a rule for handling input arguments in Python. When used in a Python script, you would get the following debugging information:

>>> a = add(7,13)
received 7
received 13
In the typemap specification, the symbols $source and $target are place holders for C variable names that SWIG uses when generating wrapper code. In this example, $source is the Python object containing the input value and $target would be the C integer value that is going to be passed into the "add" function.

Creating a new typemap

A new typemap can be created as follows :

%typemap(lang,method) type {
    ... Conversion code ...
}

lang specifies the target language and method defines a particular conversion method. type is the actual C++ datatype as it appears in the interface file (it is not the ltype value described in the section on the SWIG type system). The code corresponding to the typemap is enclosed in braces after the declaration.

A single typemap rule can be applied to multiple datatypes by giving a comma separated list of datatypes. For example :

%typemap(python,in) int, short, long, signed char {
   $target = ($ltype) PyInt_AsLong($source);
   if (PyErr_Occurred()) return NULL;
   printf("received %d\n", $target);
}
Here, $ltype is expanded into the local datatype used during code generation (this is the assignable version of the type described in the SWIG type system section).

Typemaps may also be defined for specific names as in:

%typemap(perl5,in) char **argv {
   ... Turn a perl array into a char ** ...
}

A "named" typemap will only apply to an object that matches both the C datatype and the name. Thus the char **argv typemap will only be applied to function arguments that exactly match "char **argv". Although the name is usually the name of a parameter in a function declaration, this depends on the typemap method (sometimes the function name itself is used).

Finally, there is a shortened form of the typemap directive :

%typemap(method) Datatype {
	...
}

When the language name is ommitted, the typemap will be applied to the current target language. This form is really only recommended for typemap methods that are language independent (there are a few). It is not recommended if you are building interfaces for multiple languages unless you are careful to hide the typemap with conditional compilation.

Deleting a typemap

A typemap can be deleted by providing no conversion code. For example :

%typemap(lang,method) type;              // Deletes this typemap

Copying a typemap

A typemap can be copied using the following declaration :

%typemap(lang,method) type = srctype;     // Copies a typemap
This specifies that the typemap for type should be the same as the srctype typemap. Here is an example:
%typemap(python,in) long = int;

Typemap matching rules

When datatypes are processed in an interface file, SWIG tries to apply typemap rules as follows: When more than one typemap rule might be defined, only the first match found is actually used. Also, it is important to note that the types must match exactly (including any qualifiers). Here is an example that shows how some of the rules are applied:
typedef int Integer;

%typemap(python,in) int *x {
   ...
}

%typemap(python,in) int * {
   ...
}

%typemap(python,in) Integer *x {
   ...
}

void A(int *x);      // int *x rule 
void B(int *y);      // int * rule
void C(Integer *);   // int * rule (via typedef)
void D(Integer *x);  // Integer *x rule
void E(const int *); // No match. No rules for const int *
Compatibility note: SWIG1.1 applied a complex set of type-matching rules in which a typemap for int * would also match many different variations including int &, int [], and qualified variations. This feature is revoked in SWIG1.3. Typemaps must now exactly match the types and names used in the interface file.

Compatibility note: Starting in SWIG1.3, typemap matching tries to follow typedef declarations if possible (as shown in the above example). This type of matching is only performed in one direction. For example, if you had typedef int Integer and then defined a typemap for Integer, that typemap would not be applied to the int datatype. Earlier versions of SWIG did not follow typedef declarations when matching typemaps. This feature has primarily been added to assist language modules that rely heavily on typemaps (e.g., a typemap for "int" now defines the default for integers regardless of what kind of typedef name is being used to actually refer to an integer in the source program).

Common typemap methods

Note: All parts beyond here still need more work to bring up to date. --Dave.

The following methods are supported by most SWIG language modules. Individual language may provide any number of other methods not listed here.



Understanding how some of these methods are applied takes a little practice and better understanding of what SWIG does when it creates a wrapper function. The next few diagrams show the anatomy of a wrapper function and how the typemaps get applied. More detailed examples of typemaps can be found on the chapters for each target language.

Writing typemap code

The conversion code supplied to a typemap needs to follow a few conventions described here.

Scope

Typemap code is enclosed in braces when it is inserted into the resulting wrapper code (using C's block-scope). It is perfectly legal to declare local and static variables in a typemap. However, local variables will only exist in the tiny portion of code you supply. In other words, any local variables that you create in a typemap will disappear when the typemap has completed its execution.

Creating local variables

Sometimes it is necessary to declare a new local variable that exists in the scope of the entire wrapper function. This can be done by specifying a typemap with parameters as follows :

%typemap(tcl,in) int *INPUT(int temp) {
	temp = atoi($source);
	$target = &temp;
}

What happens here is that temp becomes a local variable in the scope of the entire wrapper function. When we set it to a value, that values persists for the duration of the wrapper function and gets cleaned up automatically on exit. This is particularly useful when working with pointers and temporary values.

It is perfectly safe to use multiple typemaps involving local variables in the same function. For example, we could declare a function as :

void foo(int *INPUT, int *INPUT, int *INPUT);

When this occurs, SWIG will create three different local variables named `temp'. Of course, they don't all end up having the same name---SWIG automatically performs a variable renaming operation if it detects a name-clash like this.

Some typemaps do not recognize local variables (or they may simply not apply). At this time, only the "in", "argout", "default", and "ignore" typemaps use local variables.

Special variables

The following special variables may be used within a typemap conversion code :

When found in the conversion code, these variables will be replaced with the correct values. Not all values are used in all typemaps. Please refer to the SWIG reference manual for the precise usage.

Typemaps for handling arrays

One of the most common uses of typemaps is providing some support for arrays. Due to the subtle differences between pointers and arrays in C, array support is somewhat limited unless you provide additional support. For example, consider the following structure appears in an interface file :

struct Person {
	char name[32];
	char address[64];
	int id;
};

When SWIG is run, you may get the following warnings :

swig -python  example.i
Generating wrappers for Python
example.i:2.  Warning. Array member will be read-only.
example.i:3.  Warning. Array member will be read-only.

These warning messages indicate that SWIG does not know how you want to set the name and address fields. As a result, you will only be able to query their value.

To fix this, we could supply two typemaps in the file such as the following :


%typemap(memberin) char [32] {
	strncpy($target,$source,32);
}
%typemap(memberin) char [64] {
	strncpy($target,$source,64);
}

The "memberin" typemap is used to set members of structures and classes. When you run the new version through SWIG, the warnings will go away and you can now set each member. It is important to note that char[32] and char[64] are different datatypes as far as SWIG typemaps are concerned. However, both typemaps can be combined as follows :

// A better typemap for char arrays
%typemap(memberin) char [] {
	strncpy($target,$source,$dim0);
}

When an empty dimension is used in a typemap, it matches any array dimension. When used, the special variable $dim0 will contain the real dimension of the array and can be used as shown above.

Multidimensional arrays can also be handled by typemaps. For example :

// A typemap for handling any int [][] array
%typemap(memberin) int [][] {
	int i,j;
	for (i = 0; i < $dim0; i++)
		for (j = 0; j < $dim1; j++) {
			$target[i][j] = *($source+$dim1*i+j);
		}
}

When multi-dimensional arrays are used, the symbols $dim0, $dim1, $dim2, etc... get replaced by the actual array dimensions being used.

Typemaps and the SWIG Library

Writing typemaps is a tricky business. For this reason, many common typemaps can be placed into a SWIG library file and reused in other modules without having to worry about nasty underlying details. To do this, you first write a file containing typemaps such as this :

// file : stdmap.i
// A file containing a variety of useful typemaps

%typemap(tcl,in) int INTEGER {
	...
}
%typemap(tcl,in) double DOUBLE {
	...
}
%typemap(tcl,out) int INT {
	...
}
%typemap(tcl,out) double DOUBLE {
	...
}
%typemap(tcl,argout) double DOUBLE {
	...
}
// and so on...

This file may contain dozens or even hundreds of possible mappings. Now, to use this file with other modules, simply include it in other files and use the %apply directive :

// interface.i
// My interface file

%include stdmap.i                         // Load the typemap library

// Now grab the typemaps we want to use
%apply double DOUBLE {double};

// Rest of your declarations

In this case, stdmap.i contains a variety of standard mappings. The %apply directive lets us apply specific versions of these to new datatypes without knowing the underlying implementation details.

To clear a typemap that has been applied, you can use the %clear directive. For example :

%clear double x; 			// Clears any typemaps being applied to double x

Implementing constraints with typemaps

One particularly interesting application of typemaps is the implementation of argument constraints. This can be done with the "check" typemap. When used, this allows you to provide code for checking the values of function arguments. For example :

%module math

%typemap(perl5,check) double *posdouble {
	if ($target < 0) {
		croak("Expecting a positive number");
	}
}

...
double sqrt(double posdouble);

This provides a sanity check to your wrapper function. If a negative number is passed to this function, a Perl exception will be raised and your program terminated with an error message.

This kind of checking can be particularly useful when working with pointers. For example :

%typemap(python,check) Vector * {
	if ($target == 0) {
		PyErr_SetString(PyExc_TypeError,"NULL Pointer not allowed");
		return NULL;
	}
}

will prevent any function involving a Vector * from accepting a NULL pointer. As a result, SWIG can often prevent a potential segmentation faults or other run-time problems by raising an exception rather than blindly passing values to the underlying C/C++ program.

Typemap examples

Typemaps are inherently language dependent so more examples appear in later chapters. The SWIG Examples directory also includes a variety of examples. Sophisticated users may gain more by examining the typemaps.i and constraints.i SWIG library files.

How to break everything with a typemap

It should be emphasized that typemaps provide a direct mechanism for modifying SWIG's output. As a result, it can be very easy to break almost everything if you don't know what you're doing. For this reason, it should be stressed that typemaps are not required in order to use SWIG with most kinds of applications. Power users, however, will probably find typemaps to be a useful tool for creating extremely powerful scripting language extensions.

Typemaps and the future

The current typemap mechanism is in the process of refinement and will probably be extended to handle groups of types in a future release.


SWIG 1.3 - Last Modified : September 23, 2001