Flexing Your Modules

Perl is a weakly typed language. Broadly, this means that the type of data contained by a variable isn't definitely known until the variable is used at run time.

Perl does late binding of method calls. This means that the particular subroutine that is invoked by a method call isn't definitely known until the method is invoked at run time.

In both cases, the underlying issue is whether certain decisions are made earlier or later: at compile time or run time. Languages like C++ require the programmer to make decisions early; Perl allows them to be made later.

Perl is occasionally criticized for these features. Some argue that weak typing impedes robust software design. And late binding does impose some run time overhead.

On the other hand, these features allow extraordinary flexibility in the design and code of Perl programs. Fundamentally, this is because decisions that are made early must be expressed in the text of the program, while decisions that are made later can be computed by the execution of the program, and execution is more powerful that text.

In this column, we discuss some ways to exploit this flexibility in the design and implementation of Perl modules.

Reimplementation

One of the benefits of modules is reuse. Once a module has been written, many different applications can use it. Schematically, reuse looks like this

    +-----------+   +-----------+   +-----------+
    | Text	|   | Spread	|   | CAD	|
    | Editor	|   | Sheet	|   | Package	|
    +----+------+   +----+------+   +----+------+
	 |		 |		 |
	 +---------------+---------------+
			 |
	 	    +----+------+
		    | Printer	|
		    | Driver	|
		    +-----------+

In this example, three different applications are using the same printer driver.

A related, but different, benefit of modules is reimplementation. Once an application has been coded to a module interface, it can use any module that implements that interface—even modules that are written after the application. Schematically, reimplementation looks like this

		    +-----------+
		    | CAD	|
		    | Package	|
		    +----+------+
			 |
	 +---------------+---------------+
	 |		 |		 |
    +----+------+   +----+------+   +----+------+
    | Printer	|   | Plotter	|   | CRT	|
    | Driver	|   | Driver	|   | Driver	|
    +-----------+   +-----------+   +-----------+

In this example, a single application is using three different device drivers.

Reuse doesn't depend upon late binding. If a module already exists, then a new application can use it, and it doesn't matter when it binds to it.

Pretty much by definition, reimplementation requires late binding. If an application has been written and bound (early) to a particular module, then it can't later use a different module—not without rebinding. Depending upon the language, rebinding may require relinking, recompiling, or even rewriting the application.

Languages that do early binding typically have special facilities to support reimplementation. In C++, you use virtual base classes. In Java, you use interfaces. In Perl, you just do it.

Consider, for example, the HTML::Stream module. This module generates HTML. Nominally, it sends output to a filehandle, supplied by the caller:

$fh     = new IO::File ">$file.html";
$stream = new HTML::Stream $fh;

Internally, HTML::Stream writes to $fh by calling its print method:

$fh->print(...);

However, $fh needn't be an IO::File object. Because Perl does late binding, the only requirement on $fh is that it be blessed into a package that has a print method. This means that you can easily create and use alternate implementations of print. The HTML::Stream POD provides this example.

package StringHandle;

sub new 
{
    my $class = shift;
    my $sh    = '';
    bless \$sh, $class;
}

sub print
{
    my $sh = shift;
    $$sh  .= join('', @_);
}

package main;
use HTML::Stream;

my $sh     = new StringHandle;
my $stream = new HTML::Stream $sh;

A StringHandle object has a reference to a single string, and the StringHandle::print method simply appends its arguments to that string. HTML::Stream then outputs to this string, rather than to a file.

In the context of Perl, all this may seem unremarkable. In other languages, however, it can be difficult or impossible.

The 2x2 ways to call a method

The usual idiom for creating and using an object in Perl is something like this

my $circle = new Circle $x, $y, $radius;
$circle->draw($color, $weight);

If you look inside the Circle package, you'll find that new and draw are both subroutines. But they seem to be used differently: draw is called with the arrow operator, while new isn't; draw is called on a $circle object, while new is called on the Circle package.

There are actually two separate distinctions wound up together here: the distinction between direct and indirect syntax, and the distinction between class and instance methods. These distinctions combine to make four different ways to call a method. All four are in use, so it is worth sorting them out.

Direct vs. Indirect

Perl provides two different method call syntaxes: direct and indirect.

The direct syntax uses the arrow operator. On the left is either a package name, or a reference that has been blessed into a package. On the right is the name of a subroutine within that package. Arguments are placed in parentheses following the method name.

Circle->new ($x, $y, $radius)
$circle->draw($color, $weight)

The indirect syntax is modeled after the syntax of Perl's own print statement. First comes the method name, followed by either a package name or a blessed reference. Arguments follow the package name. There is no comma between the package name and the first argument.

new   Circle $x, $y, $radius
draw $circle $color, $weight

The difference between the direct and indirect syntax is just that: syntax. The semantics are exactly the same.

Class vs. Instance

There is a semantic distinction between class methods and instance methods. An instance method is called on a particular object (instance) of a particular class. In contrast, a class method is called on the class itself, and not on any particular object. In the abstract, this sounds confusing, but it follows almost immediately from the way objects are created and used.

For example, new is a class method. When we create a new circle, we write

new Circle $x, $y, $radius

to apply the new method to the Circle class. We can't call new on a $circle object (although see Method Overloading, below), because we don't have one yet: that's why we're calling new.

Conversely, draw is an instance method. When we draw a circle, we write

draw $circle $color, $weight

to apply the draw method to a $circle object. It wouldn't make sense to call draw on the Circle class: as a class, Circle represents all circles in the abstract, not any particular circle that could actually be drawn.

The difference between class and instance methods is a fundamental semantic distinction. You have to get it right, or your program won't work.

Putting it together

We can summarize the four possibilities in a 2x2 table, like this:

Syntax
Indirect Direct
Semantics Class
new Circle $x, $y, $radius

Circle->new($x, $y, $radius)

Instance
draw $circle $color, $weight

$circle->draw($color, $weight)

		Syntax
		Indirect	Direct
Semantics	Class	new Circle $x, $y, $radius	Circle->new($x, $y, $radius)
Instance	draw $circle $color, $weight	$circle->draw($color, $weight)

It is very common in Perl code to call class methods (especially constructors) with indirect syntax and to call instance methods with direct syntax: to only use the upper left and lower right entries in the table. However, there is no requirement to do this. We can use either syntax with either semantics.

Substance vs. Style

There are both substantive and stylistic considerations in the choice of syntax.

In many cases, the indirect syntax is more readable. One reason is that it mimics English word order. For example,

new Circle

reads like adjective-noun, and

draw $circle

reads like verb-object.

If we have Set::IntSpan objects, then

union $a $b

is perhaps more natural than

$a->union($b)

which obscures the symmetry of the underlying operation.

On the other hand, the direct syntax is more powerful than the indirect syntax. With the direct syntax, you can chain method calls

Circle->new($x, $y, $r)->draw($c, $w);
	
$a->union($b)->intersect($c)

and you can do a computed method call by placing the method name in a scalar variable

$method = 'draw';
$circle->$method;

Indirect Ambiguity

A specific problem with the indirect syntax is that

a b

can be parsed as either a method call

b->a()

or as a subroutine call

a(b)

The parser chooses the first if b is known to be a package name at the point of the call, and the second if it is not. Whether b is known to be a package name, in turn, can depend upon the load order of modules in the program.

Perl always gets it right, but humans cope poorly with this sort of distant ambiguity. Tom Christiansen gives examples of subtle bugs that can result, and the Perl docs suggest that the indirect syntax be avoided entirely.

However, this seems extreme. Variety of expression is one of the things that makes Perl such a lucid language. If you are uncertain how the parser will interpret an indirect call, then by all means use the direct syntax. But if you know what it does and it does what you want, then consider using the indirect syntax where it aids readability.

What to call yourself?

When Perl makes a method call, the first argument ($_[0]) is either the name of the package on which the call was made (for class methods) or a reference to the object on which the call was made (for instance methods). For all but the most trivial methods, you will want to assign this argument to a lexical (my) variable inside the method body.

my $x = shift;

The question is what to name the variable.

For class methods, $class and $package are natural choices. In the common case where the class method is a constructor, you can then bless the object into $package.

sub new
{
    my $package = shift;
    my $object  = { };
    bless $object, $package	
}

For instance methods, $self and $this are frequently seen. However, these may not be the best choices.

The names $self and $this function as pronouns. They don't name the object themselves; rather, they mean "whatever object this method was called on". It is up to the reader to remember what object that is. It seems a small point, but reading code is hard: we need all the help we can get.

As an alternative, consider naming the self variable after the package. Then we can write things like

sub Circle::move
{
    my($circle, $dx, $dy) = @_;
    $circle->{x} += $dx;
    $circle->{y} += $dy;
}

and know, immediately, on every line, that the self variable refers to a Circle object. In effect, we save ourselves the trouble of doing an indirection in our head.

Method overloading

Perl doesn't support method overloading: every subroutine must be defined once and only once. But then, Perl does support method overloading: because Perl doesn't do any type checking on subroutine arguments, you can pass in whatever data types you like, and let the subroutine sort it out.

Here's a common example

sub Circle::new
{
    my($self, $x, $y, $r) = @_;
    my $package = ref $self || $self;
    my $circle  = { x => $x,
    		    y => $y,
    		    r => $r  };
    bless $circle, $package
}

With this definition, we can call new on either the package name or an existing object.

my $circle1 =  Circle ->new($x, $y, $radius);
my $circle2 = $circle1->new($x, $y, $radius);

Here's a method that loads data from a file, a file handle, a string or an array:

sub Doc::load
{
    my($doc, $source) = @_;
    my $ref    = ref $source;
    local *isa = \&UNIVERSAL::isa;

    not $ref                and $doc->load_file  ($source);
    isa $source, 'IO::File' and $doc->load_fh    ($source);
    $ref eq 'SCALAR'        and $doc->load_string($source);
    $ref eq 'ARRAY'         and $doc->load_list  ($source);
}

You can also do a kind of overloading based on return type. The wantarray operator returns true if the subroutine was called in list context, and false if it was called in scalar context.

One simple application of this is to return an array in list context, and an array reference in scalar context

sub MakeList
{
    my @list = ...
    wantarray ? @list : \@list
}

Applications can then write

$list = MakeList;

for efficiency, and

@list = MakeList;

for simplicity.

Conclusion

Weak typing and late binding contribute to the lucid and flexible nature of Perl. Properly used, these features can enhance the clarity, expressive power, and functionality of your modules.

Notes

compile time or run time: The execution of Perl program is not strictly segmented into "compile time" and "run time"; however, this notion is adequate for our purposes. See Is Perl a Compiler or an Interpreter? for the gory details.
execution is more powerful than text: That's why we build computers to execute our programs.
requirement: If this requirement is not met, then the program terminates with a fatal error. This is, indeed, one of the drawbacks of late binding.
arrow operator: borrowed from C++
parentheses: The parentheses may be omitted if there are no arguments.
no comma: If there is a comma, then Perl parses the expression as a list operator call, rather than an indirect method call. This is a common source of errors in Perl programs.
gets it right: by definition
$this: this is the corresponding keyword in C++
type checking: Subroutine prototypes do enforce certain kinds of types checks, but their real purpose is to control argument coercion, not check data types. See Far More Than Everything You've Ever Wanted to Know about Prototypes in Perl for further discussion.
return type: Eat your heart out, C++

Steven W. McDougall / resume / swmcd@theworld.com / 1999 August 31