ASN.1 Misuse

Carl M. Ellison
September 15,1995

Converted to HTML on Nov 11, 1995

There are also PostScript and LaTeX source versions of this paper.

Introduction

The ASN.1 standard (Abstract Syntax Notation 1) is proliferating, in spite of cries of anguish by computer professionals who are faced with implementing to those standards. The standard is clearly flawed. Some of those flaws come from its complexity and ambiguity -- a product most likely of having started down the wrong path and been subject to corrections by various interested parties along the way. It has acquired various warts and is apparently the result of a committee effort. Other flaws come from its misuse, and those are the ones addressed in this paper.

Goals in conflict

ASN.1 is viewed differently by writers of standards and implementors. Neither group is unanimous in its evaluation, but it tends to be predominantly favored by the former and predominantly despised by the latter.

The writer of standards

With ASN.1 one can declare in a compact and readable way that a given data structure is to contain (for example) 2 fields, the first of which is an optional byte string and the second of which is an integer whose maximum size is yet to be determined. These structures and fields can be named with arbitrary length names, but the current practice is to use long names without abbreviations and with capital letters indicating starts of words within a name. forExample, thisWouldBeASampleName. As a result, names can be explicit enough (and long enough) that comments are not needed to explain the function and content of a defined structure. ASN.1 has an additional redeeming feature to a person drafting a standard. It is not tied to any existing computer language. Therefore, one can choose to use it in a standards committee without getting embroiled in a long discussion over which computer language to use. As a mechanism for the standards writer, then, it looks like ASN.1 is a clear winner -- all positives and no negatives. The example below will show some of these advantages in operation.

The implementor

An implementor is not concerned with smoothly functioning standards committees and concise, readable standards documents. The implementor has chosen some computer language or hardware design mechanism and wants to create some program or device. The data structures which the standards writer specifies in ASN.1 have to be expressed in some form. In particular, they have to be expressed in two forms: in a computer's memory and in transit between computers. In short, the implementor demands a definition of computer memory structures and messages -- and forces ASN.1 to become a compilable language for specifying both. This destroys its main value -- that it is not a computer language over which people on a standards committee might fight. ASN.1 does not address the representation of a structure within a computer's memory, but there are associated encoding rule options (BER, DER, ...) which specify the representation of an ASN.1 structure in transit between computers. A major problem, however, is that ASN.1 starts with an abstract syntax and the encoding rules must be able to handle everything ASN.1 can express. Since ASN.1 has no concept of size limits, the encoding rules must also be free of size limits (as must any general computer memory representation of an ASN.1 structure). Each field or substructure has a length and the length fields can express sizes as large as $(2^{1008}-1)$ bytes. For larger values, there is an indefinite length encoding option allowing infinite length fields and structures. This is all very cute -- however, in the rush to find ways to encode abstact syntax, there was no mechanism defined to allow someone to specify an unsigned short int simply. So, ASN.1 itself and its encoding rules acquired a wart -- a means of noting, after the definition of an INTEGER (which is supposed to be of any length), that the legal values are to be within a specific range. To an implementor who, without outside interference, would have specified a data structure in some real programming language and then have specified an encoding of that structure for transferring it between machines, the ASN.1 mechanism (with its encoding rules) is a perversion.

An example

Let's take a concrete example. I had an application which was doing 3-key triple-DES encryption of a data block, in CBC mode. The keying material for this encryption needed to be transmitted to another process (inside an RSA encryption) and therefore had to be packed into a byte string. I intended to have this application interoperate among machines of different endianness, so I had to be careful about machine independence. I also wanted the application to migrate to machines of different word sizes, but that was not a firm requirement. The keying material consisted of 1 IV (of 64 bits) and up to 3 keys (of 56 bits each). However, DES keys occupy 64 bits each (with one unused bit in the least significant bit of each byte) -- so from a programmer's perspective, this is a structure of four 8-byte fields. Thinking as an implementor, I defined a C structure:
  unsigned char kiv[32] ; /* IV and 3 DES keys */
  #define IV  (0)   /* offset for the IV */
  #define K1  (8)   /* offset for K1 (key for 1st encr) */
  #define K2 (16)   /* offset for K2 */
  #define K3 (24)   /* offser for K3 */
Data structure packing was a non-problem, since this is already a byte string. Access to the structure was via calls such as:
  ok = triple_des_setup( &(kiv[IV]), &(kiv[K1]),
                         &(kiv[K2]), &(kiv[K3]),
                         &des_env ) ;
which is all very straight-forward. The code produced is both readable and efficient. The encoding for transit is extremely efficient. There is no packing or parsing code involved. However, this isn't modern. It doesn't use ASN.1. It isn't in proper standards form. It assumes C and that might cause fights in the committee. So, let's do the same thing with ASN.1.

The ASN.1 example

First we have to decide whether to use IMPLICIT or EXPLICIT context tags. Let's choose the default -- EXPLICIT -- not mention it again and hope it works. The next step is to define a basic key and IV structure.
  KeyAndIV ::= SEQUENCE {
    initialValue  OCTET STRING,
    key1          OCTET STRING,
    key2          OCTET STRING,
    key3          OCTET STRING
  }
We could stop there, but we're in a standards committee and Alice points out that while an IV might be a byte string (a block), the key is really a bit string. This potential conflict is resolved by defining sub-structures for both the IV and the key, allowing them to take on data types to be agreed upon later. Bob points out that the KeyAndIV block is really the contents of a key exchange block and should be renamed. Charlie points out that the KeyExchangeBlockContents should describe what algorithm is being used for the encryption. Fortunately, ASN.1 is really easy to change. A few swipes and strokes on the whiteboard and we have:
  KeyExchangeBlockContents ::= SEQUENCE {
    algorithmIdentifier   AlgorithmIdentifier,
    initialValue          InitialValue,
    key1                  Key,
    key2                  Key,
    key3                  Key
  }

  AlgorithmIdentifier ::= OBJECT IDENTIFIER

  InitialValue ::= OCTET STRING

  Key ::= BIT STRING
Now we're getting down to business in this standards meeting. We're starting to be productive. Dave points out that we might want to make some of these keys optional. Maybe we just want to have single-DES. So, another swipe and we have:
  KeyExchangeBlockContents ::= SEQUENCE {
    algorithmIdentifier   AlgorithmIdentifier,
    initialValue          InitialValue,
    key1                  Key,
    key2              [1] Key OPTIONAL,
    key3              [2] Key OPTIONAL
  }

  AlgorithmIdentifier ::= OBJECT IDENTIFIER

  InitialValue ::= OCTET STRING

  Key ::= BIT STRING
Eve suggests that someone may want to have the option of using different algorithms in each of the stages of the multiple encryption. No problem. ASN.1 is easy to change. We'll move the AlgorithmIdentifier from the KeyExchangeBlockContents to a KeyAndAlgorithm SEQUENCE.
  KeyExchangeBlockContents ::= SEQUENCE {
    initialValue          InitialValue,
    key1                  KeyAndAlgorithm,
    key2              [1] KeyAndAlgorithm OPTIONAL,
    key3              [2] KeyAndAlgorithm OPTIONAL
  }

  AlgorithmIdentifier ::= OBJECT IDENTIFIER

  InitialValue ::= OCTET STRING

  KeyAndAlgorithm ::= SEQUENCE {
    algorithmIdentifier   AlgorithmIdentifier,
    Key                   BIT STRING
  }

And, while we're at it, Fred notes that we don't need to restrict ourselves to a maximum of three keys. We could simplify the structure and make it more general by using a SEQUENCE OF for the keys. Gary points out that the IV might be transmitted separately from the key block, so it should be optional. Now we have:
  KeyExchangeBlockContents ::= SEQUENCE {
    initialValue          InitialValue OPTIONAL,
    keys                  KeyList
  }

  AlgorithmIdentifier ::= OBJECT IDENTIFIER

  InitialValue ::= OCTET STRING

  KeyList ::= SEQUENCE OF  KeyAndAlgorithm

  KeyAndAlgorithm ::= SEQUENCE {
    algorithmIdentifier   AlgorithmIdentifier,
    key                   BIT STRING
  }
That looks good. Let's put that in our standard document. That should cover everyone. Now we can hand that to the implementor. The implementor produces several files: asn_util.h, asn_util.c, prim_defs.h, kex_sizes.h, kex.h, kex_asn.h, kex.c and kex_asn.c -- defining data structures, and providing the code to parse and pack ASN.1 to/from those structures, as well as code to carve, initialize and free those structures.
 26353 asn_util.c
  6744 asn_util.h
  2056 prim_defs.h
  1733 kex.h
  2097 kex_asn.h
    66 kex_sizes.h
  4204 kex.c
 11832 kex_asn.c
for a total code size of 55085 characters, as compared to the original 48 characters. This is an expansion in code by a factor of 1148. The byte string built for 3 DES keys is 86 bytes -- only a factor of 2.7 expansion over the 32 bytes of the C-based scheme. In this implementation, the SEQUENCE OF is implemented as an array of pointers to the sequenced structure blocks. That array is assumed to have a maximum number of elements. However, the code will probably have to be enhanced, to be true to the ASN.1, to permit acceptance of an indefinite number of key and algorithm identifier pairs. The C structures which result from this process are:
typedef unsigned long   ulong ;
typedef unsigned char   uchar ;


/* Variable Length Entity */
typedef struct
{
        long    len;    /* # of bytes in value */
        uchar   *val;   /* pointer to the bytes of value */
} asn_vbl_lth ;

/* Bit string entity */
typedef struct
{
        long    len;    /* # of bytes in value */
        uchar   *val;   /* pointer to the bytes of value */
        short   nuub ;  /* # of unused bits in the LSByte */
        short   adv ;   /* TRUE if val advanced 1 byte */
} asn_bitstr ;

typedef asn_vbl_lth OBJECT_ID ;
typedef asn_vbl_lth  INTEGER ;
typedef asn_bitstr  BIT_STRING ;
typedef asn_vbl_lth  OCTET_STRING ;

typedef OBJECT_ID   AlgorithmIdentifier ;

typedef OCTET_STRING   InitialValue ;

typedef struct {
  AlgorithmIdentifier    algorithmIdentifier ;
  BIT_STRING     key ;
} KeyAndAlgorithm ;

typedef struct {
  long  n ;          /* number of elements */
  KeyAndAlgorithm *elt[ MAX_KeyList ] ;
} KeyList ;

typedef struct {
  InitialValue  *initialValue ;
  KeyList        keys ;
} KeyExchangeBlockContents ;

[...]

KeyExchangeBlockContents keyBlock ;

[...]

  /* code to test the sizes and algorithm IDs */
  /* of the IV and keys goes here */

  /* code to verify that an IV and 3 keys are present */
  /* and to give an error or branch elsewhere goes here */

  ok = triple_des_setup( keyBlock.initialValue->val,
                         keyBlock.keys.elt[0]->key.val,
                         keyBlock.keys.elt[1]->key.val,
                         keyBlock.keys.elt[2]->key.val,
                         &des_env ) ;

as compared to the original C structure which did our job quite adequately:
unsigned char kiv[32] ; /* IV and 3 DES keys */

[...]

    ok = triple_des_setup( &(kiv[IV]), &(kiv[K1]),
                           &(kiv[K2]), &(kiv[K3]),
                           &des_env ) ;
Of course, this use in triple_des_setup isn't the whole story. One must create the initial IV and keys. In the original scheme, I filled the 32 bytes from a random number source in one call. With the ASN.1-inspired structures, one must do 4 calls to that source and set up a great deal of auxilliary structure as well.

Philosophical debate

Even with that example, there are those who might argue in favor of the ASN.1 approach. After all, the argument goes, it is good to handle the limitless possibilities, especially when dealing with standards. This way, we can achieve interoperability when algorithms or key sizes change. That argument is flawed, of course. The ASN.1 generality does not imply that an application written to use such structures will accept every algorithm and number of keys given to it. Old code which was written from the ASN.1 spec is not guaranteed to be able to interoperate with new code using new algorithm options. For example, the old code might have been written before the new algorithm was defined. Meanwhile, the ease of specifying and implementing the concrete example (3 DES keys and nothing else) allows someone to make a set of interoperable applications based on those limitations. When someone comes along with a change (eg., 3 keys for triple-IDEA), it is equally convenient to define a structure for its key exchange block, add a 1-byte algorithm identifier to the start of a structure (as opposed to a 9-byte uniquely allocated OBJECT IDENTIFIER) and add code to call IDEA three times. One can make several such changes with the same effort it takes to write the first ASN.1 parsing/packing code. Of course, this debate can go on forever. I have provided only the implementor's side. The examples generated here were run through an ASN.1 to C compiler which I wrote the last time I was forced to implement ASN.1 packing and parsing. Even with that compiler, the work of the implementor is multiplied, however. The longer names with multiple fields offer more opportunities for bugs and make the code less readable (since less of it fits on one page or screen). The additional checking (e.g., to make sure that (optional) pointer fields are present) represents additional programming effort.

An alternative

The flaw in ASN.1 is that it starts with abstract syntax. It is possible to write a data structure definition language which has the same flavor as C or PASCAL (the only languages worth considering) -- includes constructs which are expressible in both and omits constructs which lead to bugs (such as C's ``union'' construct or PASCAL's valueless ``case'' inside a RECORD). Specifications could be written by an implementor in such a language. When a seasoned implementor defines a data structure, it is defined with a minimum of leafiness so as to simplify the code used to access it. It will have small names, to ease typing. It will have comments to describe it. It will be simple to pack and unpack (into and out of a byte string). What the standards writer wants is a language in which to specify a data structure (especially for transmission) uniquely and with all the generality of a real computer language. What the implementor needs is a specification of a transmitted data structure and an automated tool for taking that specification and generating the data structures and pack/unpack code required to convert it. Such a tool is not difficult to produce. There might even be a commercial one by now. Specifications in such language can totally specify the block structure of an implementation's messages.
Carl Ellison --- cme@acm.org