ASN.1 introduction and overview -- SSLeay 0.9.0b -- January 1999

What is ASN.1, anyhow?

ASN1 (Abstract Syntax Notation 1) is a means of describing digital objects.

A complicated object, for example, an X509 certificate, is built out of other objects which in turn are built out of still other objects and so on until we get down to a set of primitives that are the basic building blocks for the whole business.

Here is a simple example.

X509 certificates contain a subfield that says when the certificate first becomes valid; this is called the Validity field and its ASN.1 definition is below.

Validity ::= SEQUENCE {
  notBefore            UTCTIME,
  notAfter             UTCTIME
  }

This says: define Validity as a sequence (one object right after the other, in the order given) of the fields 'notBefore' which is a UTCTIME, and 'notAfter', which is also a UTCTIME.

Now you have to go find out what the definition of a UTCTIME is; it turns out (lucky for us!) that it is a primitive type, defined in the original ASN.1 specification, and it contains the Coordinated Universal Time in a very specific format.

I should say right now that I use 'something like' the original ASN.1 syntax (from 1998) in these documents; the later specs (from 1993) introduce some unreadable modifications, IMHO.

In addition to a syntax for describing objects, there is also a set of encoding rules that specify how to take one of these descriptions and turn it into binary that an application can manipulate.

These are the so-called 'Basic Encoding Rules' (BER). It turns out that these rules sometimes allow an object to be encoded into a few different equvialent binary forms; since this is often inconvenient (for example, if one wants to ensure that one's RelativeDistinguishedName looks the sasme every time one encodes it), a refinement is the 'Distinguished Encoding Rules' (DER). An ASN.1-specified object has only one binary form using DER-encoding.

Each ASN.1 primitive object has a number associated with it, called a tag; this is used when encoding it into binary format.

What are the ASN.1 primitive types?

Here's a list of primitive types, their definitions, and their tags.

TypeDefinitionTag (Hex)
BOOLEANHas two values: true, false01
INTEGERHas integer values02
BIT STRINGString of zero or more bits03
OCTET STRINGString of zero or more bytes04
NULLHas one value: NULL05
OBJECT IDENTIFIERUnique string of numbers associated with an object06
OBJECT DESCRIPTORBrief text description of an object07
EXTERNALObject which may not be describable in ASN.1 08
REALHas real values09
ENUMERATEDList of values each of which has a distinct identifier as part of the ASN.1 description 0a
SEQUENCE and SEQUENCE OFEither an ordered list of values, one for each type in the SEQUENCE, or an ordered list of zero or more values of a particular type, for SEQUENCE OF10
SET and SET OFUnordered list of values, one for each type in the SET, or an unordered list of zero or more values of a particular type, for SET OF11
NumericString0-9 and space12
PrintableStringA-Za-z0-9 space '()+,-./:=?13
TeletexString14
VideotexString15
IA5String16
UTCTimeCoordinated Universal Time17
GeneralizedTime18
GraphicString19
VisibleString10
GeneralString1a
UniversalString1b
BMPString1c

CHOICE, SELECTION, and ANY are also primitive types that are not in this table because they do not have their own tags.

How do you encode an ASN.1 object?

You can look at the quite excellent Layman's Guide to a Subset of ASN.1, BER, and DER by Burt Kaliski of RSA DSI for a tutorial on encoding rules; in the meantime it is a good idea to know just enough to be able to read the ASN1 library code and understand what it does.

The first byte of every encoded object indicates what kind of object it is (by tag). This is the so-called 'identifier octet' for the object.

Specifically, bytes 8 and 7 are 0 if the object is a primitive type (one out of the list above). Then byte 6 determines how the length of the data is going to be (provided, encoded. specified. given.) In the event that we are going to specify a particvlar fixed length, which is always the case with a primitive type, then we set bit 6 to 0. And finally, bytes 5-0 are the tag number.

After this come the so-called 'length octets'; these indicate how many bytes the object itself is, not including the length octets themselves. If the first such octet has the 8th bit set, then the rest of it (bits 7-0) specify how many 'real' length octets follow; read that many and turn them into a base 256 number to get the length in bytes of the data. If in the other hand the 8th bit is not set then the rest of the byte (bits 7-0) are simply the number of bytes in the data right away. So if your data is less than 128 bytes long you use one length octet; if it's greater or equal to 128 bytes in length then you have to use more than one length octet.

Finally, there is the actual data. It may itself be another encoding of an object (but not if the object is primitive; then you will only have a value). These are the so-called 'content bytes'.

Example of DER-encoding an object

It is worth looking at a couple of examples. Well, at least one.

The ASN.1 BIT STRING type designates an arbitrary string of bits of any length including zero.

A BIT STRING is DER encoded as follows:

First, pad the bit string after the last bit with 0's to make the length of the string a multiple of 8 (no padding if it already is one).

Next, count the number of bits you added for padding and write it; this becomes the first content byte.

Next, write the bytes of the bit string with the trailing padding, most significant byte first. These are the rest of the content bytes.

You will put a leading byte in front of all this that has:

bit 8: 0, bit 7: 0 (universal class); 
bit 6: 0 ( indicates primitive, definite-length encoding )
bits 5-1: 0x03 (tag indicating BIT STRING type)

This is the identifier octet.

Now, count how many content bytes you have. (This is the bytes other than the identifier octet.) If you have < = 127, you will put one length octet right after the identifier octet, and before the data, as follows:

bit 8: 0
bits 7-1:  number of content octets

If you have > 127 content bytes, you will put between 2 and 127 length octets right after the identifier octet, as follows:

first byte, bit 8: 1
bits 7-1: how many length octets follow this one
remaining bytes: number of content octets, in base 256, most significant
byte first.

An example:

The bit string '01000100111011' is two short of being a multiple of eight bytes; we add two zeros on the right end, '0100010011101100'. we write '02' as the first content byte, '44 ec' as the rest. We put '03' in front as identifer octet, recall that we have three content octets, 3 < = 127 so there will be one length octet, '03'. Thus the whole encoding is

03 03 02 44 ec

where byte 1 is id, byte 2 is length and bytes 3-5 are content. See how easy that was? Now imagine a complicated type like a certificate.

So, what's in this library?

Because all of the X.509-related specs rely on ASN.1 syntax and DER encoding, we have a library of ASN.1 routines that read DER-encoded objects and convert them into some reasonable internal form that we can manipulate, or that take C strutures and turn them into DER-encoded objects.

Additionally, there are fuctions that do comparisons of some of these objects, that set and get certain values from them, and some signing routines which are in this part of the library because they do DER-encoding on the passed object before signing.

Now that you have survived this far, you can read the BER and DER encoding tutorial and go look at the rest of the library documentation.