# NAME

Align::NW - Needleman-Wunsch algorithm for optimal global sequence alignment

# SYNOPSIS

```    use Align::NW;

\$payoff = { match      => \$match,
mismatch   => \$mismatch,
gap_open   => \$gap_open,
gap_extend => \$gap_extend };

\$nw = new Align::NW \$a, \$b, \$payoff, %options
\$nw->score;
\$nw->align;

\$score = \$nw->get_score;
\$align = \$nw->get_align;```
`    \$nw->print_align;`

# DESCRIPTION

`Align::NW` finds the optimal global alignment of the sequences `\$a` and `\$b`, subject to the `\$payoff` matrix.

## Algorithm

`Align::NW` uses the Needleman-Wunsch dynamic programming algorithm. This algorithm runs in O(a*b*(a+b)), where a and b are the lengths of the two sequences to be aligned.

## Alignments

An alignment of two sequences is represented by three lines. The top line shows the first sequence, and the bottom line shows the second sequence.

The middle line has a row of symbols. The symbol is a vertical bar where ever characters in the two sequences match, and a space where ever they do not.

Dots may be inserted in either sequence to represent gaps.

For example, the two sequences

```    abcdefghajklm
abbdhijk```

could be aligned like this

```    abcdefghajklm
|| |   | ||
abbd...hijk```

As shown, there are 6 matches, 2 mismatches, and one gap of length 3.

`Align::NW` retuns an alignment as a hash

```    \$align = { a => \$a,
s => \$s,
b => \$b };```

\$a and \$b are the two sequences. \$s is the line of symbols.

## The Payoff Matrix

The alignment is scored according to a payoff matrix

```    \$payoff = { match    => \$match,
mismatch => \$mismatch,
gap_open   	 => \$gap_open,
gap_extend 	 => \$gap_extend };```

The entries in the matrix are the number of points added to the score

• for each match
• for each mismatch
• when a gap is opened in either sequence
• for each position that a gap is extended (including the first)

For correct operation, match must be positive, and the other entries must be negative.

## Example

Given the payoff matrix

```   \$payoff = { match      =>  4,
mismatch   => -3,
gap_open   => -2,
gap_extend => -1 };```

The sequences

```    abcdefghajklm
abbdhijk```

are aligned and scored like this

```                a b c d e f g h a j k l m
| |   |       |   | |
a b b d . . . h i j k```
```    match       4 4   4       4   4 4
mismatch       -3          -3
gap_open           -2
gap_extend         -1-1-1```

for a total score of 24-6-2-3 = 15. The algorithm guarantees that no other alignment of these two sequences has a higher score under this payoff matrix.

# METHODS

\$nw = `new` `Align::NW` \$a, \$b, \$payoff, %options
Creates and returns a new `Align::NW` object. \$a and \$b are the sequences to be aligned. \$payoff is the payoff matrix, described above. Additional options maybe passed in the %options hash; see /OPTIONS for details.
\$nw->`score`
Fills in the score matrix for \$nw. This is the O(a*b*(a+b)) operation.
\$nw->`align`
Backtracks through the score matrix and generates an alignment for the two sequences. `score` must be called before `align`.
\$score = \$nw->`get_score`
Returns the score of the alignment. `score` must be called before `get_score`.
\$align = \$nw->`get_align`
Returns the alignment of the two sequences, as described above in /Alignments. `align` must be called before `get_align`.
\$nw->`print_align`
Pretty prints the alignment to STDOUT. `align` must be called before `print_align`.

# OPTIONS

Options may be passed to `new` in the `%options` hash. The following options are defined.

-v
Verbose output. Prints some dots to STDERR. Useful for monitoring the progress of large alignments.

• Needleman, S.B. and Wunsch, C.D. 1970. "A general method applicable to the search for similarities in the amino acid sequences of two proteins" Journal of Molecular Biology. 48: 443-453.
• Smith, T.F. and Waterman, M.S. 1981. "Identification of common molecular subsequences" Journal of Molecular Biology. 147: 195-197

There are usually some some tutorials on Needleman-Wunsch and Smith-Waterman alignment floating around on the web. I used to provide links to some, but they kept going 404. If you Google around a bit you can probably find a current one.