Object-oriented programming in Perl

It's all done with mirrors.

Perl 4

Perl 1 released in 1988. Perl versions 2, 3, and 4 released in fairly quick succession over the next few years. I joined the party with Perl 4 in the early '90s. I thought Perl was a cool language (Perl rocks!), but I could also see that it was limited in some ways.

I was following comp.lang.perl on usenet, and I started seeing chatter about something called Perl 5. Perl 5 was going to be a major rewrite, and have lots of new features—including objects.

My immediate reaction to this was

It wasn't just that Perl 4 didn't have objects. Perl 4 didn't have anything like objects. I couldn't see any place in the language where you could possibly put an object—not without turning the language into some horrible mashup of Perl and C++.

Perl 5

Perl 5 released in the fall of 1995. I ignored it for 6 months. I was busy, and Perl 4 worked for me. Eventually, I built and installed it. It looked a lot like Perl4. It ran all my Perl 4 code.

I found the man pages for the new features. I found lexically scoped variables, which I understood. I found references, which I mostly understood. Then I found the new (whole new!) chapter titled "Perl Objects". Surely, this was where I was going to find objects.

So I read it. And after I read it, I sat there thinking, "But where are the objects?" I reread it a few times, and I circled back to some of the related documentation, like references, and I probably asked some questions on comp.lang.perl (Larry Wall was active on the group at the time). Eventually, I figured out what was going on.

Perl doesn't have anything that stands up and says, "I am an object." What Perl has is a collection of features...that conspire to support...an object-oriented programming model...if that's what you want. The whole thing has rather a done-with-mirrors character to it.

Perl objects

To program objects in Perl, we need Each of these topics is good for a 20-minute talk, but if we do that, we'll never get to objects. Instead, I'll say about each of them only what is directly relevant to objects.

References

Perl has 3 fundamental data types: scalars, which can be numbers or strings
$a = 3;
$b = 'abc';
arrays, which are lists of scalars
@c = (5, 2, $a, 'xyz');
and hashes, which are tables of key/value pairs
%d = (name    => 'Joe Smith',
      address => '10 Main Street',
      age     => 42,
      zip     => $zip);
Hash keys are strings, and hash values are scalars.

To these data types, Perl 5 adds references. A reference is like a pointer in C. Internally, it really is a pointer: it holds the memory address of a variable.

You can get a reference to any variable in a Perl program by putting a backslash in front of it.

\$a
\@c
\%d
A reference is a scalar, so it can be assigned to a scalar variable.
$sr = \$a;
$ar = \@c;
$hr = \%d;
There are several ways to get from a reference back to your data. For an array reference, you can write
$ar->[2]
This dereferences the reference to get to the underlying array, and then gets the array element at index 2.

Similarly, for a hash reference, you can write

$hr->{age}
This dereferences the reference to get to the underlying hash table, and then looks up the value for the key age.

Later, we'll see the arrow operator used in a similar way to make method calls.

Packages/Modules

Packages and modules are related, but distinct, concepts. A package is a namespace for subroutines. A module is a file that holds Perl code.

If you have a big program with multiple subsystems, you're liable to have subroutine name conflicts

sub init
{
    # initialize network
}

sub init
{
    # initialize graphics
}
In Perl, you can avoid this by putting subroutines in different packages
sub Network::init
{
    # initialize network
}

sub Graphics::init
{
    # initialize graphics
}
The package namespace is hierarchical. We could equally write
sub Firefox::Network::init
{
    # initialize network
}

sub Firefox::Graphics::init
{
    # initialize graphics
}
to avoid conflict with, say
sub Thunderbird::Network::init
{
    # initialize network
}
A module is a file that holds Perl code. You load the code in a module into your program with a use statement.
#!/usr/bin/perl

use Firefox::Network;
The use statement takes a package name. Modules are stored in directory trees that map the package namespace. Perl converts the package name to a relative path, and searches for that path in the list of directories given in the global array @INC. The use statement above would search for the module file on the path
Firefox/Network.pm
A module can contain any code. However, it is conventional to use a module to hold the code for the package that it is named after. So if we looked in the Firefox/Network.pm module, we would likely see
sub Firefox::Network::init
{
    # initialize Firefox::Network
}
You can use a package statement to set the default namespace for subroutines. This is commonly done in modules, and saves typing and visual clutter when many subroutines are being defined within the package namespace
package Firefox::Network;

sub init  # This is really Firefox::Network::init
{
    # initialize Firefox::Network
}
When we define a class

bless

bless is a built-in function in Perl. Suppose we have a hash
%dog = (ears => "long",
	tail => "short");
and a reference to it
$dog = \%dog;
Then we can write
bless $dog, "Animal::Dog"
and Perl will tag the %dog hash with the Animal::Dog package name. The $dog reference is then said to be "blessed into the Animal::Dog package". Perl documentation, Perl error messages, and Perl programmers all talk about "blessing a reference", and "blessed references" and "unblessed references", but it isn't the reference that gets tagged with the package name: it is the underlying hash table. (I'll use that same terminology here.)

Method Calls

We can use a blessed reference like an object. To make a method call on $dog, write
$dog->bark;
When Perl sees this, it
Animal::Dog::bark($dog)
When I first read the Perl Objects man page, my real question wasn't "where are the objects?", but rather, "where is the instance data?". The answer is, it's in the hash. Every method gets the hash reference passed as its first argument. With that reference in hand, it can go into the hash and get whatever instance data it needs
sub bark
{
    my $dog  = shift;
    my $ears = $dog->{ears};
    ...
}
There's nothing special about hashes. You can bless scalars and arrays and filehandles and code refs, and use them as objects in the same way. But most objects are implemented as hashes, precisely because hashes are convenient for storing instance data.

You can pass additional arguments to the method in parentheses

$dog->bark('loud')
and the subroutine picks them up in the usual way
sub bark
{
    my($dog, $volume) = @_;
    ...
}

Class methods

As shown above, bark is an instance method: we call it on an existing Animal::Dog object. We can also have class methods: methods that are called without having an object in hand. To do this, we put the package name to the left of the arrow
Animal::Dog->phylum
When we call a method like this, the package name gets passed as the first argument to the subroutine
sub Animal::Dog::phylum
{
    my $package = shift;

    return "Vertebrata";
}
The most common class method that you'll see is new, which is conventionally used as a constructor
sub Animal::Dog::new
{
    my($package, $ears, $tail) = @_;

    my %dog = (ears => $ears,
	       tail => $tail);

    bless \%dog, "Animal::Dog"
}
The first argument to bless is a reference to %dog. For convenience, bless returns its first argument, and in the subroutine above, that becomes the return value of the subroutine. So we can write
$dog = Animal::Dog->new('long', 'short');
and get a new Animal::Dog object.

The method call syntax using the arrow is called the "arrow", or "direct" syntax. There is an alternate syntax that you will see, called the "indirect" syntax. The indirect syntax has the method name first, followed by a package name or object reference, and then any arguments.

bark $dog;
bark $dog 'loud';

$dog = new Animal::Dog;
$dog = new Animal::Dog 'long', 'short';
You can use either syntax for both class and instance methods. However, the most common usage is to use indirect syntax for class methods (especially constructors) and direct syntax for instance methods
$dog = new Animal::Dog;
$dog->bark;

Inheritance

Perl does method inheritance. Every package has an array named @ISA
@Animal::Dog::ISA = qw(Animal);
The packages named in @ISA serve as base classes.

If you make a method call

$dog->bark;
and there is no Animal::Dog::bark() subroutine, then Perl will go looking for a bark() routine (recursively, depth first) in the packages listed in @ISA, and call the first one that it finds.

Inheritance doesn't have anything to do with package namespaces. As shown, Animal::Dog inherits from Animal, but it needn't, and it can inherit from anything else that it wants to. If I wanted my dog to really bark, I might want it to inherit from a package of audio drivers, too

@Animal::Dog::ISA = qw(Animal Audio::Driver);
The Animal::Dog::new() routine shown above hard-codes the Animal::Dog package name in the call to bless. Better style is to use whatever package name is passed into the method
sub Animal::Dog::new
{
    my($package, $ears, $tail) = @_;

    my %dog = (ears => $ears,
	       tail => $tail);

    bless \%dog, $package
}
Then subclasses can inherit your new() method
package Beagle;

Beagle::ISA = qw(Animal::Dog);
I haven't defined a Beagle::new() method, but if I call
my $beagle = new Beagle;
then Perl will invoke
Animal::Dog::new("Beagle")
and $beagle will be blessed into the Beagle package, which is what I want.

Perl doesn't have any formal facilities for data inheritance. But every method, in whatever package, gets the object reference passed as its first argument. All the packages in the inheritance tree can store their instance data in that hash, and methods can grab whatever they need from it.

If you're worried about packages stepping on each others' hash keys, you can establish a convention for divvying up the keyspace among the packages in some way, but I've never had to do that.

Perl docs

If you've gotten this far, the Perl object docs will provide additional information.

Translations

Hindi courtesy of Nikol
Italian courtesy of Ahsan Soomro
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Steven W. McDougall / resume / swmcd@theworld.com / 2009 August 12