Do not meddle in the affairs of wizards, for you are crunchy and good with ketchup.

XS Mechanics

This article is about XS. It explains what it is, why it is, how it works, and how to use it. It includes a complete, working example of an XS module, and a stub module that you can use as a starting point for your own code. It is an express goal of this article to provide the background and information necessary for you to write your own XS modules.

This article is in five parts

November Introduction motivation, definitions, examples

December Architecture the Perl interpreter, calling conventions, data representation

January Tools h2xs, xsubpp, DynaLoader

February Modules Math::Ackermann, Set::Bit

March Align::NW Needleman-Wunsch global optimal sequence alignment

Two months ago, we presented a problem that could benefit from an XS implementation. Last month, we discussed the architecture of XS. This month, we discuss the tools that are used to write XS.

Tools

At a coding level, XS revolves around two tools: h2xs and xsubpp. Like any tools, these are easier to use if you understand how they are intended to be used. Chisels cut with the grain, not across it. To understand how the XS tools are intended to be used, we need some historical background.

Perl is often used for tasks that were formerly done with shell scripts, C programs, and assorted Unix tools, such as find(1), awk(1), sed(1), and sort(1). To help programmers port existing software to Perl, the Perl distribution includes some translation utilities, such as find2perl, a2p (awk to Perl), and s2p (sed to Perl). The output of these utilities may require some editing, but they generate reasonably complete and correct translations.

There is no c2p. Such a program would be difficult to write; besides, a direct translation from C to Perl is rarely desirable. More commonly, we want to call existing C code from new Perl programs.

`h2xs`

The interface to C code—data types, function prototypes—is usually specified in a .h file. To generate interfaces to C code, Perl provides h2xs. h2xs is a utility that reads a .h file and generates an outline for an XS interface to the C code. This includes

a module directory
a Makefile.PL file
an .xs file
a .pm file

However, the output of h2xs is not a complete, or even a nearly complete, XS interface. It is merely a beginning. It is a valuable beginning: it includes some boilerplate that is difficult to generate by hand. But it is only a beginning.

Neither is the output of h2xs necessarily correct. Interfacing Perl to C is a hard problem. h2xs makes guesses about how to do it; sometimes it guesses wrong.

If you run h2xs assuming that the results will be complete and correct—assuming that you will find structure and coherence in its output—then you are going to be very confused, and very frustrated. To move forward from the outline that h2xs generates, you must accept it as strictly provisional.

Similar issues surround the inputs to h2xs. h2xs takes many command line options. However, these do not constitute a complete and coherent system for making h2xs do what you need. Rather, they have accumulated over time, each one added to meet a particular need, in a particular context. Many of these options are useful, but there is not necessarily any combination of them that will make h2xs do the Right Thing for you. You have to take what you can get and go forward from there.

`xsubpp`

xsubpp is the program that translates XS code to C code. XS is sometimes referred to as a language, but it is better thought of as a collection of macros; xsubpp is the macro expander. Again, the XS macros do not constitute a complete and coherent language for interfacing Perl to C. They have accumulated over time, each one added to meet a particular need.

Writing XS doesn't require an understanding of the deep structure of the macros—there isn't any. Rather, it requires searching through perlxs to find a macro that does what you need, and then using that macro.

`h2xs`

The first step in creating any Perl module is to run h2xs from the command line.

Directories

Unless you know exactly what you are doing, running h2xs is a process of successive refinement. You should create a development directory for this purpose. In the examples below, we'll refer to the development directory as

.../development/

When you run h2xs, it creates a new directory within the development directory to hold the module sources; we'll call this the module directory. The module directory is created on a path that maps the module name. For example, the module directory for Align::NW is

.../development/Align/NW/

Existing libraries

h2xs was originally written to generate XS interfaces for existing C libraries. At its simplest, you specify the header file for a library, and it creates and populates a module directory. If the header file is /usr/include/rpcsvc/rusers.h, we can do

.../development>h2xs rpcsvc/rusers
Writing Rusers/Rusers.pm
Writing Rusers/Rusers.xs
Writing Rusers/Makefile.PL
Writing Rusers/test.pl
Writing Rusers/Changes
Writing Rusers/MANIFEST

h2xs searches for the header file in the current directory and on the standard include paths, and complains if it doesn't find it.

.../development>h2xs foo
Can't find foo.h

Module naming

h2xs names the module and the module directory after the header file. It upcases the first letter of the name, in accordance with the Perl convention that module names have leading capitals.

If you don't like the module name that h2xs generates, you can specify a different one with the -n flag.

.../development>h2xs -n RPC::Rusers rpcsvc/rusers
Writing RPC/Rusers/Rusers.pm
Writing RPC/Rusers/Rusers.xs
Writing RPC/Rusers/Makefile.PL
Writing RPC/Rusers/test.pl
Writing RPC/Rusers/Changes
Writing RPC/Rusers/MANIFEST

The -n flag controls both the name of the module directory and the name of the Perl module; in this case, the Perl module will be RPC::Rusers. The -n flag doesn't affect the search for the header file: h2xs still finds the header in /usr/include/rpcsvc/rusers.h.

New code

Next month, we will show an XS implementation of the Align::NW module. Align::NW isn't an interface to an existing library, and its headers aren't in /usr/include/. It is a new Perl module that is partly implemented in C.

Because the C code for Align::NW is part of the module, it ought to live in the module directory. The Perl code will be in Align/NW/NW.pm, and it is tempting to name the C sources to match

.../development/Align/NW/NW.pm
.../development/Align/NW/NW.c
.../development/Align/NW/NW.h

However, this won't work. The problem is that h2xs is going to create

.../development/Align/NW/NW.xs

and xsubpp will translate NW.xs into

.../development/Align/NW/NW.c

which collides with our NW.c file.

Another possibility is to integrate the C code into the .xs file

.../development/Align/NW/NW.pm
.../development/Align/NW/NW.xs	# contains our C code
.../development/Align/NW/NW.h

This works, because anything in a .xs file that isn't an XS macro is passed through unchanged by xsubpp to the .c file. Some XS modules implement large amounts of C code directly in the .xs file; ultimately, the distinction between XS code and C code becomes arbitrary.

However, I prefer to keep the bulk of my C code in .c files, and reserve the .xs file for glue routines. Reasons for this include

I can compile the .c files together with a main.c and test them in a stand-alone C program.
Writing XS is hard; I want to have as little code in the .xs file as possible.

The C code in Align::NW implements a single Perl method, named score. We'll name our C sources score.c and score.h. Then we can create and populate the module directory like this

.../development>ls
score.c   score.h
.../development>h2xs -n Align::NW score
Writing Align/NW/NW.pm
Writing Align/NW/NW.xs
Writing Align/NW/Makefile.PL
Writing Align/NW/test.pl
Writing Align/NW/Changes
Writing Align/NW/MANIFEST
.../development>cp score.c score.h Align/NW/

Constants

Many C header files #define constants that appear in their interfaces. h2xs parses these constants and makes them available to the Perl module as methods. For example, if score.h contained the lines

#define FOO 17
#define BAR 42

then the values 17 and 42 would be available to Perl code as the return values of Align::NW::FOO() and Align::NW::BAR(), respectively.

h2xs doesn't do this by creating FOO() and BAR() methods. Instead, it creates Align::NW::AUTOLOAD() in Align/NW.pm, and a C routine named constant() in Align/NW.xs.

Calls to FOO() and BAR() are handled by Align::NW::AUTOLOAD(). AUTOLOAD() calls constant(), and constant() returns the value #define'd in the .h file.

Align::NW::AUTOLOAD() enforces a Perl function prototype on constant methods. To satisfy this prototype, you have to predeclare any constant methods that you use, like this

sub FOO ();
sub BAR ();

No constants

If you don't need any constants from your header files, you can run h2xs with the -c switch. This suppresses the AUTOLOAD routine from the .pm file and the constant routine from the .xs file.

If you don't need the AutoLoader for anything else, you can run h2xs with the -A switch. -A implies -c, and additionally suppresses inheritance from AutoLoader.

Glue routines

C header files typically contain function prototypes. h2xs can parse function prototypes and generate glue routines based on them. It doesn't always guess right about how to convert parameters, so we may have to edit the glue by hand. Even so, this can save us some typing. To automatically generate glue routines, do

.../development>h2xs -n Align::NW -A -O -x -F '-I ../..' score.h

The -n and -A flags are as before. If you've previously run h2xs, you'll need the -O flag to force it overwrite the existing Align/NW/* files. The -x flag tells h2xs to generate glue routines based on the function prototypes in score.h.

The -x flag uses the C::Scan module to locate header files. You'll need to have this module installed on your system in order to use -x.

We run h2xs from the development directory, but C::Scan is cd'd to the module directory when it searches for header files. The -F flag specifies additional switches for C::Scan to pass to the C preprocessor. We pass a -I ../.. switch to tell the preprocessor to search for headers two levels up, in the development directory. This allows it to find score.h.

XS Anatomy

Now that we know how to run h2xs, we're going to look inside an .xs file to see what is there.

Terms

Here are some terms that we will use in the discussion below.

target routine: the C routine that we want to call from Perl; the reason we're doing all this
Perl routine: the Perl routine that invokes the target routine
xsub: the C routine that is installed as an xsub for the Perl routine
XS routine: the description of the xsub that we write in the .xs file. xsubpp translates XS routines to xsubs for us.
XS: the language in which we write XS routines. Also called XS code. XS code is a combination of XS directives and C code.
glue routine: refers generally to the XS routine and the xsub. Emphasizes the role of the xsub in connecting the Perl routine to the target routine.
XS directives: The elements of the XS langauge, documented in perlxs
XS macros: C language macros that xsubpp emits when it generates an xsub.

Hypotenuse

Here is a target routine, in a file called hypotenuse.c

double hypotenuse(double x, double y)
{
    return sqrt(x*x + y*y);
}

and its prototype in hypotenuse.h

double hypotenuse(double x, double y);

The Perl routine is Geometry::hypotenuse(). We want calls to Geometry::hypotenuse() to invoke the target routine.

`h2xs`

We run h2xs

.../development>h2xs -n Geometry -A hypotenuse.h

and it generates Geometry/Geometry.xs as

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"

#include <hypotenuse.h>

MODULE = Geometry               PACKAGE = Geometry

I've omitted the -x flag. This is an instance where it guesses wrong about parameter conversion. It is easy enough to fix, but you have to understand the typemap, which we haven't discussed yet.

`Geometry.xs`

Let's look at Geometry.xs in detail.

The Perl C API

The first three #includes give our XS code access to the Perl C API. Through them you can find all the entry points and data types mentioned in perlguts.

`hypotenuse.h`

The next #include gives our XS code access to our own C header file: hypotenuse.h. h2xs searches for header files in the current working directory and on the standard include path; however, the #include directive that it emits uses angle brackets instead of quotes. Angle brackets instruct the C compiler to only search for header files on the standard include path. We're going to put hypotenuse.h in the module directory, which is not on the standard include path, so we need to edit the #include directive to use quotes.

#include "hypotenuse.h"

Then the C compiler will find hypotenuse.h in the module directory.

`MODULE` and `PACKAGE`

MODULE and PACKAGE are XS directives. They specify the module and package for our xsubs. This is easy to understand if we remember the underlying definitions

module: a file containing Perl code
package: a Perl namespace

We write xsubs in XS; xsubpp translates the XS code to straight C; the C compiler compiles the C code into link libraries; the makefile installs those libraries, and the DynaLoader loads those libraries at run time. In order to load a library, the DynaLoader needs to know two things:

the file that contains the library
the Perl namespace in which to install the xsubs

The MODULE directive tells it the file, and the PACKAGE directive tells it the namespace.

The PACKAGE directive names a Perl package, like Geometry or Align::NW. xsubpp then generates code to install xsubs in that package.

The MODULE directive doesn't name an actual file: it names a Perl package, just like the PACKAGE directive. xsubpp maps that package name into a file name, like Geometry.so or Align/NW.so. The makefile installs that file on an appropriate path in the Perl library, and the DynaLoader finds it there.

An .xs file may contain multiple MODULE and PACKAGE directives. MODULE and PACKAGE directives should always appear together, as shown above. All MODULE directives in an .xs file should name the same module. PACKAGE directives can name different packages as necessary to place different xsubs into different Perl packages; this is quite analogous to the use of repeated package statements in ordinary Perl code.

Now we'll start adding things to Geometry.xs.

`PROTOTYPES`

Like ordinary Perl subroutines, xsubs can have prototypes. The PROTOTYPES directive tells xsubpp whether or not to install our xsubs with prototypes. Write

PROTOTYPES: ENABLE

PROTOTYPES: DISABLE

to enable or disable prototypes.

The PROTOTYPES directive goes below the MODULE directive. If you put it above the MODULE directive, it will be passed through to the C compiler, and cause compilation errors.

h2xs predates prototypes in Perl, and does not emit a PROTOTYPES directive for you. xsubpp complains if you forget to add one. I generally enable prototypes, unless I have some reason not to.

XS Routines

After the PROTOTYPES directive come XS routines.

An XS routine can contain nearly arbitrary code. However, in simple cases, all it needs to do is describe the signature of the target routine. To do this, it specifies

the return type of the target routine
the name of the target routine
the name and type of each parameter to the target routine

Here is an XS routine that describes our target routine

double
hypotenuse(x, y)
        double  x
        double  y

The newlines are significant; the indentation is not. However, this is the style that h2xs uses, and I usually follow it.

The name of the XS routine is hypotenuse. xsubpp derives the name of the Perl routine from the name of the XS routine. In this example, xsubpp also determines the name of the target routine from the name of the XS routine. Later on, we'll see examples where the target routine has a different name than the XS routine.

make

Now drop down into the module directory. Edit Makefile.PL and add the name/value pair

'OBJECT'    => 'Geometry.o hypotenuse.o'

to the arguments of WriteMakefile. Then do

.../development/Geometry>cp ../hypotenuse.c .
.../development/Geometry>cp ../hypotenuse.h .
.../development/Geometry>perl Makefile.pl
.../development/Geometry>make

Makefile.pl writes a makefile. The makefile runs

xsubpp to translate Geometry.xs to Geometry.c
the C compiler to compile Geometry.c to Geometry.o
the linker to link Geometry.o into a link library

`Geometry.c`

Here is Geometry.c, edited a bit for clarity.

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"

#include "hypotenuse.h"

XS(XS_Geometry_hypotenuse)
{
    dXSARGS;
    if (items != 2)
        croak("Usage: Geometry::hypotenuse(x, y)");
    {
        double  x = (double)SvNV(ST(0));
        double  y = (double)SvNV(ST(1));
        double  RETVAL;

        RETVAL = hypotenuse(x, y);
        ST(0) = sv_newmortal();
        sv_setnv(ST(0), (double)RETVAL);
    }
    XSRETURN(1);
}

XS(boot_Geometry)
{
    dXSARGS;
    char* file = __FILE__;

    XS_VERSION_BOOTCHECK ;

    newXSproto("Geometry::hypotenuse", XS_Geometry_hypotenuse, file, "$$");
    XSRETURN_YES;
}

Geometry.c is a ordinary C source file, suitable for compilation. It looks strange because it is written with XS macros. Let's decode the macros and see how it works.

Includes

The #includes are passed through unchanged from the .xs file. The C compiler will need them.

XSub

XS_Geometry_hypotenuse is the actual xsub that is generated by xsubpp. The xsub name is pasted together from

the token XS
the name given in the PACKAGE directive
the name of the XS routine

The XS() macro declares XS_Geometry_hypotenuse with the return type and parameters that Perl expects an xsub to have. These are not the parameters to hypotenuse(); we will get those from the Perl stack.

dXSARGS is another XS macro; it declares some local variables that the xsub needs.

One of the locals declared by dXSARGS is items; this gives the number of arguments that were passed to the xsub on the Perl stack. As declared, hypotenuse() requires 2 arguments; the xsub emits a usage message if hypotenuse() is called from Perl with the wrong number of arguments.

Next comes the code that extracts arguments from the Perl stack

double  x = (double)SvNV(ST(0));
double  y = (double)SvNV(ST(1));

ST() is an XS macro that accesses an argument on the Perl stack: ST(0) is the first argument, ST(1) is the second, and so on.

Perl passes parameters by reference, so the things on the stack are pointers to the underlying scalars. SvNV is an entry point in the Perl C API. It takes a pointer to a scalar and returns the value of that scalar as a number. xsubpp adds a (double) typecast to quiet the C compiler, and assigns that value to a local variable: x for ST(0) and y for ST(1).

xsubpp also declares a local variable to hold the return value of the subroutine.

double  RETVAL;

This variable is always named RETVAL, but it is declared with whatever type the subroutine returns.

With x, y, and RETVAL set up, xsubpp can generate a call to the target routine. xsubpp emits the name of the XS routine as the name of the target routine.

RETVAL = hypotenuse(x, y);

There is no magic here. This is a perfectly ordinary C subroutine call. Don't get used to it.

The next two lines return the value to Perl.

ST(0) = sv_newmortal();
sv_setnv(ST(0), (double)RETVAL);

Return values go on the Perl stack, starting at ST(0). sv_newmortal and sv_setnv are entry points in the Perl C API. sv_newmortal creates a new scalar value. Like any scalar, it has an initial value of undef. sv_setnv sets the value of the scalar to the value that was returned from hypotenuse.

Finally, the XSRETURN(1) macro tells the interpreter how many values we are returning on the Perl stack: in this case, one.

boot

boot_Geometry is the subroutine that DynaLoader calls to install the xsubs in the Geometry module. The subroutine name is pasted together from

the token boot
the name given in the MODULE directive

To install an xsub, boot_Geometry calls

newXSproto("Geometry::hypotenuse", XS_Geometry_hypotenuse, file, "$$");

newXSproto is an entry point in the Perl C API. Its arguments are

the name of a Perl subroutine
a pointer to a C subroutine
the name of a C source file
a Perl subroutine prototype

newXSproto installs the C subroutine XS_Geometry_hypotenuse as an xsub for the Perl routine Geometry::hypotenuse. It supplies a prototype, because we specified PROTOTYPES: ENABLE in the .xs file. The source file name is provided so that Perl can report it in error messages.

The name of the Perl routine is constructed from

the name given in the PACKAGE directive
the name of the XS routine

xsubpp only generates one boot routine per module. The boot routine makes one call to newXSproto for each xsub in the module.

Test

To test our work, edit Geometry/test.pl and add the line

print Geometry::hypotenuse(3, 4), "\n";

at the end. Then do

.../development/Geometry>make test

The output should be

1..1
ok 1
5

`r2p`

hypotenuse() has a simple signature; given that signature, xsubpp can generate code to call it. In more complex cases, we have to write some of the code ourselves. XS provides directives that allow us to supply C code directly, instead of relying on xsubpp. In the examples below, we'll use these to take over progressively more control from xsubpp.

Here is another target routine, in a file called r2p.c

double r2p(double x, double y, double *theta)
{
    *theta = atan2(y, x);
    return sqrt(x*x + y*y);
}

and its prototype in r2p.h

double r2p(double x, double y, double *theta);

r2p converts rectangular to polar coordinates, so it has to return 2 values: a magnitude and an angle. The magnitude is the return value of the subroutine; the angle is returned in a third parameter, passed by address. If we write the XS routine as

double
r2p(x, y, theta)
        double  x
        double  y
        double  theta

then xsubpp will treat theta as an input parameter. It will initialize it from the Perl stack, and won't return a value in it. Instead, we write the XS routine as

double
r2p(x, y, theta)
        double  x
        double  y
        double  theta = NO_INIT
        CODE:
                RETVAL = r2p(x, y, &theta);
        OUTPUT:
        RETVAL
        theta

The NO_INIT directive suppresses initialization from the Perl stack.

The CODE directive tells xsubpp that we will supply C code to call the target routine. xsubpp still declares RETVAL for us, but we have to assign the return value to it. The call to r2p is

RETVAL = r2p(x, y, &theta);

This is not an XS directive; it is a C statement, and will be passed through to the C compiler. Therefore, it ends with a semicolon.

The OUTPUT directive lists values that are to be copied back to Perl scalars. The order in which we list them doesn't matter; xsubpp knows where each value goes. We need to return both RETVAL and theta.

Here is the xsub that xsubpp generates for this XS routine.

XS(XS_Geometry_r2p)
{
    dXSARGS;
    if (items != 3)
        croak("Usage: Geometry::r2p(x, y, theta)");
    {
        double  x = (double)SvNV(ST(0));
        double  y = (double)SvNV(ST(1));
        double  theta;
        double  RETVAL;
                RETVAL = r2p(x, y, &theta);
        sv_setnv(ST(2), (double)theta);
        SvSETMAGIC(ST(2));
        ST(0) = sv_newmortal();
        sv_setnv(ST(0), (double)RETVAL);
    }
    XSRETURN(1);
}

It looks very much like the xsub for hypotenuse. xsubpp declares theta for us, so that we can pass its address to r2p. It also generates these lines to return theta to Perl

sv_setnv(ST(2), (double)theta);
SvSETMAGIC(ST(2));

It knows to assign theta to ST(2), because we declared theta as the 3rd parameter to r2p. SvSETMAGIC ensures that the scalar at ST(2) will be created, if necessary. It must be created, for example, if it is a non-existent array or hash value.

Test

We can add r2p to the Geometry module. Copy r2p.c and r2p.h into the module directory and add r2p.o to the OBJECT list in Makefile.PL. Add an

#include "r2p.h"

line and the XS code shown above to Geometry.xs. Add

my $theta;
my $r = Geometry::r2p(3, 4, $theta);
print "$r, $theta\n";

to test.pl. Now do

.../development/Geometry>perl Makefile.pl
.../development/Geometry>make
.../development/Geometry>make test

The output should be

1..1
ok 1
5
5, 0.927295218001612

`r2p_list`

In the examples above, the Perl routine and the target routine have essentially the same name and signature. However, this isn't necessary. For example, in Perl, it would be more natural to call a routine like r2p as

($r, $theta) = r2p_list($x, $y);

We can obtain this calling sequence with this XS routine

void
r2p_list(x, y)
        double  x
        double  y
        PREINIT:
        double  r;
        double  theta;
        PPCODE:
                r = r2p(x, y, &theta);
        EXTEND(SP, 2);
        PUSHs(sv_2mortal(newSVnv(r    )));
        PUSHs(sv_2mortal(newSVnv(theta)));

There are a few differences between this XS routine and the one that we wrote above for r2p.

The name of the XS routine doesn't match the name of the target routine. xsubpp doesn't need the name of the target routine, because we are supplying the code to call the target routine. xsubpp still uses the name of the XS routine to derive the name of the Perl routine.

The return type of r2p_list is void. This doesn't mean that r2p_list doesn't return anything. Rather, it tells xsubpp that we will supply the code to return values to Perl. Therefore, xsubpp doesn't declare RETVAL for us.

The PREINIT directive gives us a place to declare C variables. Without it, xsubpp might emit executable C code before our variable declarations, which is a syntax error in C. We declare two C variables: r and theta.

The PPCODE directive is similar to the CODE directive. It tells xsubpp that we will supply both the C code to call r2p and the PP code to return values to Perl. PP code is Perl Pseudocode; it is the internal language that the Perl interpreter executes.

The C code to call r2p is

r = r2p(x, y, &theta);

and the PP code to return values to Perl is

EXTEND(SP, 2);
PUSHs(sv_2mortal(newSVnv(r    )));
PUSHs(sv_2mortal(newSVnv(theta)));

The EXTEND macro allocates space on the stack for 2 scalars, and the PUSHs macros push the scalars onto the stack. The PP macros are passed through to the C compiler, so they end with semicolons, like any other line of C code.

The xsub that xsubpp generates is

XS(XS_Geometry_r2p_list)
{
    dXSARGS;
    if (items != 2)
        croak("Usage: Geometry::r2p_list(x, y)");
    SP -= items;
    {
        double  x = (double)SvNV(ST(0));
        double  y = (double)SvNV(ST(1));
        double  r;
        double  theta;
                r = r2p(x, y, &theta);
        EXTEND(SP, 2);
        PUSHs(sv_2mortal(newSVnv(r    )));
        PUSHs(sv_2mortal(newSVnv(theta)));
        PUTBACK;
        return;
    }
}

xsubpp emits code to extract our arguments from the Perl stack, as before. It passes our C variable declarations and our subroutine call through unchanged. It also passes our PP code through.

The biggest difference between XS_Geometry_r2p and XS_Geometry_r2p_list is the stack management. XS_Geometry_r2p uses an XSRETURN(1) macro call to return one value on the stack. XS_Geometry_r2p_list lowers SP by the number of input parameters, and then issues a PUTBACK macro before returning.

I don't actually understand what any of the stack macros do. I wrote the glue routines shown above by following the examples in perlxs. The macros are defined in /usr/local/lib/perl5/version/architecture/CORE/*.h, but when I tried reading them, I quickly got lost in a maze of #defines, #ifdefs, typedefs, and internal Perl data structures.

Lacking a principled understanding of Perl stack management, you can't actually write PP code: all you can do is follow working examples, as I have. The examples in perlxs appear to be adequate for most xsubs.

`r2p_open`

We saw above that the target routine needn't have the same calling sequence as the Perl routine. In fact, we don't need a target routine at all. Once we have a CODE or a PPCODE directive in our XS code, we can put any C code in the XS routine.

In r2p_open, we dispense with the r2p routine, and compute r and theta in open code.

void
r2p_open(x, y)
        double  x
        double  y
        PREINIT:
        double  r;
        double  theta;
        PPCODE:
                r     = sqrt(x*x + y*y);
                theta = atan2(y, x);
        EXTEND(SP, 2);
        PUSHs(sv_2mortal(newSVnv(r    )));
        PUSHs(sv_2mortal(newSVnv(theta)));

Here is the xsub that xsubpp emits. It looks just like the xsub for r2p_list, except for the lines that compute r and theta.

XS(XS_Geometry_r2p_open)
{
    dXSARGS;
    if (items != 2)
        croak("Usage: Geometry::r2p_open(x, y)");
    SP -= items;
    {
        double  x = (double)SvNV(ST(0));
        double  y = (double)SvNV(ST(1));
        double  r;
        double  theta;
                r     = sqrt(x*x + y*y);
                theta = atan2(y, x);
        EXTEND(SP, 2);
        PUSHs(sv_2mortal(newSVnv(r    )));
        PUSHs(sv_2mortal(newSVnv(theta)));
        PUTBACK;
        return;
    }
}

Add these lines to Geometry/test.pl to test our new xsubs.

($r, $theta) = Geometry::r2p_list(3, 4);
print "$r, $theta\n";

($r, $theta) = Geometry::r2p_open(3, 4);
print "$r, $theta\n";

When we run

.../development>make test

we get

1..1
ok 1
5
5, 0.927295218001612
5, 0.927295218001612
5, 0.927295218001612

For reference, here are the final versions of

perlxs

These examples illustrate only basic XS programming, using a few directives. perlxs documents over 20 XS directives. It includes examples and code fragments showing how to use them. You should read through it to understand the range of facilities offered by XS.

The typemap

In the examples above, we've seen how to write XS routines, and how to use XS directives to control the C code that xsubpp emits. Now, we're going to look at how xsubpp converts data between Perl and C representations.

Here's the problem. When the Perl interpreter calls a subroutine, it pushes a list of scalars onto the Perl stack. On input, an xsub has to get those scalars off the stack and convert them to C data. On output, the xsub has to convert C data to Perl scalars and put the scalars back on the stack. xsubpp must emit the C code to do these conversions.

Conversion between Perl and C data types is handled with macros and routines in the Perl C API, but the necessary operations vary, depending on the C data types and the direction of the conversion. Consider:

C data type input output

int    n n   = (int   ) SvIV(ST(0)) sv_setiv(     ST(0), (IV    )n  )

double x x   = (double) SvNV(ST(0)) sv_setnv(     ST(0), (double)x  )

char  *psz psz = (char *) SvPV(ST(0),na) sv_setpv((SV*)ST(0),         psz)

C data type	input	output
`int n`	`n = (int ) SvIV(ST(0))`	`sv_setiv( ST(0), (IV )n )`
`double x`	`x = (double) SvNV(ST(0))`	`sv_setnv( ST(0), (double)x )`
`char *psz`	`psz = (char *) SvPV(ST(0),na)`	`sv_setpv((SV*)ST(0), psz)`

We could imagine a big switch statement inside xsubpp to select the right code fragment for each C data type, but this would be clumsy and inflexible. It would be better to put the code fragments in a table, like the one shown above.

If we start writing such a table, we quickly discover that the mapping between Perl and C datatypes is not one-to-one. As a strongly typed language, C distinguishes more data types than Perl does. For example, these seven C integer types are all converted with essentially the same code fragment, the only variation being the typecast used to quiet the C compiler.

C data type input output

int            n n = (int           )SvIV(ST(0)) sv_setiv(ST(0), (IV)n)

unsigned       n n = (unsigned      )SvIV(ST(0)) sv_setiv(ST(0), (IV)n)

unsigned int   n n = (unsigned int  )SvIV(ST(0)) sv_setiv(ST(0), (IV)n)

long           n n = (long          )SvIV(ST(0)) sv_setiv(ST(0), (IV)n)

unsigned long  n n = (unsigned long )SvIV(ST(0)) sv_setiv(ST(0), (IV)n)

short          n n = (short         )SvIV(ST(0)) sv_setiv(ST(0), (IV)n)

unsigned short n n = (unsigned short)SvIV(ST(0)) sv_setiv(ST(0), (IV)n)

C data type	input	output
`int n`	`n = (int )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`unsigned n`	`n = (unsigned )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`unsigned int n`	`n = (unsigned int )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`long n`	`n = (long )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`unsigned long n`	`n = (unsigned long )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`short n`	`n = (short )SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`
`unsigned short n`	`n = (unsigned short)SvIV(ST(0))`	`sv_setiv(ST(0), (IV)n)`

In view of this, xsubpp uses a two-level mapping. First, it maps C data types to XS types, like this

C data type XS type

int T_IV

unsigned T_IV

char T_CHAR

char * T_PV

C data type	XS type
`int`	`T_IV`
`unsigned`	`T_IV`
`char`	`T_CHAR`
`char *`	`T_PV`

Then it maps the XS types to code fragments, in two tables: one for input

XS type input code fragment

T_IV $var = ($ntype)SvIV($arg)

T_CHAR $var = (char)*SvPV($arg,na)

T_PV $var = ($ntype)SvPV($arg,na)

XS type	input code fragment
`T_IV`	`$var = ($ntype)SvIV($arg)`
`T_CHAR`	`$var = (char)*SvPV($arg,na)`
`T_PV`	`$var = ($ntype)SvPV($arg,na)`

and one for output

XS type output code fragment

T_IV sv_setiv ($arg, (IV)$var);

T_CHAR sv_setpvn($arg, (char *)&$var, 1);

T_PV sv_setpv ((SV*)$arg, $var);

XS type	output code fragment
`T_IV`	`sv_setiv ($arg, (IV)$var);`
`T_CHAR`	`sv_setpvn($arg, (char *)&$var, 1);`
`T_PV`	`sv_setpv ((SV*)$arg, $var);`

These tables constitute the typemap.

The XS types are meaningful only to xsubpp, and appear only in the typemap. They do not appear in Perl code, XS code, or C code.

`$var`, `$ntype`, and `$arg`

The code fragments in the typemap are not pure C code: they contain Perl variables in their text. The variables are

$var: The name of a C variable
$ntype: The type of $var
$arg: Code to access a Perl scalar

xsubpp is a Perl program. When it needs to convert an argument from Perl to C, it sets $var, $ntype, and $arg, obtains the appropriate code fragment from the typemap, and evals the fragment to replace the Perl variables with their values.

For example, consider this XS routine

int
max(a, b)
	int a
	int b

To generate code to convert the first parameter from Perl to C, xsubpp sets the Perl variables like this

variable value

$var a

$ntype int

$arg ST(0)

variable	value
`$var`	`a`
`$ntype`	`int`
`$arg`	`ST(0)`

Then, it evals the fragment

$var = ($ntype)SvIV($arg)

to yield the C code

a = (int)SvIV(ST(0))

It is important to understand how these variables work, because sometimes you have to arrange for them to have the right values in order to make xsubpp do what you want. We'll see an example of this next month when we write the XS code for Align::NW.

Typemap files

The three tables that constitute the typemap are referred to as TYPEMAP, INPUT, and OUTPUT, respectively. All three tables may be stored in a single file, with each table headed by its own name. Here is an example to illustrate the file format

# A typemap file

TYPEMAP
int			T_IV
SV *			T_SV

INPUT
T_SV
	$var = $arg
T_IV
	$var = ($ntype)SvIV($arg)
	
OUTPUT
T_SV
	$arg = $var;
T_IV
	sv_setiv($arg, (IV)$var);

The first TYPEMAP header may be omitted.

Files containing typemaps are conventionally named typemap. xsubpp can read and aggregate multiple typemap files to construct the typemap; entries in later files override entries in earlier files.

Perl supplies a default typemap in

/usr/local/lib/perl5/version/ExtUtils/typemap

XS modules may provide a local typemap file in the module directory. If the module declares structs or other C data types, it can map them to XS types in a TYPEMAP section. Local typemaps rarely need INPUT or OUTPUT sections; the default typemap almost always contains appropriate code fragments.

Next month, we'll use these tools to complete the XS implementation of Align::NW.

NOTES

maps

The mapping is similar, but not identical, to that used in the installation directory. NW.pm is developed in

.../development/Align/NW/NW.pm

but installed in

/usr/local/lib/perl5/site_perl/version/Align/NW.pm

The extra /NW/ in the development area is necessary so that we can have, for example,

.../development/Align/NW/Makefile.PL
.../development/Align/SW/Makefile.PL

without conflict.

module

Reflecting its roots at the transition from Perl4 to Perl5, the h2xs POD consistently uses the term extension for module.

suppresses

Due to a bug in h2xs, you may still find a

require AutoLoader;

statement in your .pm file. You can delete it if you like.

sv_newmortal

The word mortal refers to an optimization in the current implementation of Perl. All data objects in Perl are garbage collected. In most cases, this is done by reference counting. If an object will only exist for a short time—for example, on the stack—maintaining a reference count can impose significant overhead. To avoid this, such objects may be created as mortal. Mortal objects don't have a reference count, but are unconditionally deleted when they are no longer needed—typically at the end of the statement in which they are created. The difficulty of determining when a mortal is no longer needed is a source of continuing maintenance problems in the Perl interpreter.

test our work

For production code, it is usually preferable to put test routines in t/*.t files. See Module Mechanics for details.

SvSETMAGIC

The appearance of words like magic in an API is often a cause for concern. Compare the use of psychic in Stroustrup's discussion of the typename keyword in The C++ Programming Language.

XS Mechanics by Steven W. McDougall is licensed under a Creative Commons Attribution 3.0 Unported License.

Steven W. McDougall / resume / swmcd@theworld.com / 2000 Feb 10

November	Introduction	motivation, definitions, examples
December	Architecture	the Perl interpreter, calling conventions, data representation
January	Tools	h2xs, xsubpp, DynaLoader
February	Modules	`Math::Ackermann`, `Set::Bit`
March	`Align::NW`	Needleman-Wunsch global optimal sequence alignment

XS Mechanics

Tools

h2xs

xsubpp

h2xs

Directories

Existing libraries

Module naming

New code

Constants

No constants

Glue routines

XS Anatomy

Terms

Hypotenuse

h2xs

Geometry.xs

The Perl C API

hypotenuse.h

MODULE and PACKAGE

PROTOTYPES

XS Routines

make

Geometry.c

Includes

XSub

boot

Test

r2p

Test

r2p_list

r2p_open

perlxs

The typemap

$var, $ntype, and $arg

Typemap files

NOTES

`h2xs`

`xsubpp`

`h2xs`

`h2xs`

`Geometry.xs`

`hypotenuse.h`

`MODULE` and `PACKAGE`

`PROTOTYPES`

`Geometry.c`

`r2p`

`r2p_list`

`r2p_open`

`$var`, `$ntype`, and `$arg`