How Certificates are Born (Data Structure Edition)

Authentic key distribution is a fundamental problem of public key cryptography: how do you know that a public key really belongs an entity (the “subject”)? A MITM attacker could have intercepted and changed the public key! In other words, we need an authentic public-key-to-identity binding.

One solution is to meet up in person and verify their public key directly. This is what Signal and WhatsApp do with their Safety Numbers.

But this doesn’t scale. How would you meet up with Google to verify their website’s public key? This is where certificates come in.

A Certificate Authority (CA) verifies the identity (“thore.io”) and that this identity controls a keypair (pk, sk). The CA then signs a certificate that basically states “I, the CA, certify that the secret key belonging to the public key pk is controlled by the entity thore.io”. Now anybody who trusts that the CA is honest and that it correctly verifies identifies before issuing certificates can also trust that this public-key-to-identity binding is correct.

But what is the flow of creating certificates? What are the data structures?

Certificates are often seen as this big complex beast. And correctly so! The many extensions and options mean that it gets complicated fast. But the core data structures are not that complicated.

This post tries to break down these data structures. Using the data structures to guide us, we step through the process of creating a certificate.

Step 1: Certificate Request

It starts with what is colloquially known as “Certificate Signing Request”. It usually uses the following PKCS#10 data structures (defined in RFC 2986).

PKCS#10 looks like this:

CertificationRequestInfo ::= SEQUENCE {
    version       INTEGER { v1(0) } (v1,...),
    subject       Name,
    subjectPKInfo SubjectPublicKeyInfo{{ PKInfoAlgorithms }},
    attributes    [0] Attributes{{ CRIAttributes }}
}

CertificationRequest ::= SEQUENCE {
    certificationRequestInfo CertificationRequestInfo,
    signatureAlgorithm AlgorithmIdentifier{{ SignatureAlgorithms }},
    signature          BIT STRING
}

If subject S has a keypair and wants to obtain a certificate, it first creates a CertificationRequestInfo and fills out the subject name (“thore.io”) and the details of their public key. Then S signs the CertificationRequestInfo and together with the signature puts it into a CertificationRequest.

Step 2: X.509 Certificate

S then sends the CertificationRequest to a CA of their choice.

Now the CA needs to verify that the subject S really controls the public key S wants to bind to its identity. It can e.g. do this using ACME (defined in RFC 8555).

When the CA is satisfied it issues the certificate. Certificates generally use the X.509 format (defined in RFC 5280).

X.509 certificates look like this:

TBSCertificate  ::=  SEQUENCE  {
    version         [0]  EXPLICIT Version DEFAULT v1,
    serialNumber         CertificateSerialNumber,
    signature            AlgorithmIdentifier,
    issuer               Name,
    validity             Validity,
    subject              Name,
    subjectPublicKeyInfo SubjectPublicKeyInfo,
    issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
    subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,
    extensions      [3]  EXPLICIT Extensions OPTIONAL
}

Certificate  ::=  SEQUENCE  {
    tbsCertificate       TBSCertificate,
    signatureAlgorithm   AlgorithmIdentifier,
    signatureValue       BIT STRING
}

The CA first fills out the TBSCertificate (“to-be-signed” certificate). It copies over the values from the CertificationRequest, assigns a new serial number, chooses a validity period, and adds any extensions that it wants.¹

The CA then signs the TBSCertificate and puts everything together into the Certificate. We now have an X.509 certificate!

Step 3: Signed Certificate Timestamp

Back in the day, we would now be done. The CA returns the X.509 certificate to the requester subject S who can use it to prove their identity (e.g. in a TLS handshake).

However, after a series of CAs misbehaving and wrongly issuing certificates Google introduced Certificate Transparency (CT). With CT the CAs need to insert every certificate that they issue into a public append-only log. For efficiency, this log is a Merkle Tree with the leaves being the log entries.

CT is defined in RFC 9162. The relevant data structures are:

struct {
    VersionedTransType versioned_type;
    select (versioned_type) {
        case x509_entry_v2: TimestampedCertificateEntryDataV2;
        case precert_entry_v2: TimestampedCertificateEntryDataV2;
        case x509_sct_v2: SignedCertificateTimestampDataV2;
        case precert_sct_v2: SignedCertificateTimestampDataV2;
        case signed_tree_head_v2: SignedTreeHeadDataV2;
        case consistency_proof_v2: ConsistencyProofDataV2;
        case inclusion_proof_v2: InclusionProofDataV2;
    } data;
} TransItem;

opaque TBSCertificate<1..2^24-1>;

struct {
    uint64 timestamp;
    opaque issuer_key_hash<32..2^8-1>;
    TBSCertificate tbs_certificate;
    Extension sct_extensions<0..2^16-1>;
} TimestampedCertificateEntryDataV2;

struct {
    LogID log_id;
    uint64 timestamp;
    Extension sct_extensions<0..2^16-1>;
    opaque signature<1..2^16-1>;
} SignedCertificateTimestampDataV2;

First the CA sends the X.509 certificate to a CT log operator. (Alternatively the CA can also send a “precertificate”. This signals the CA’s binding intent to later issue a certificate.)

The log operator extracts the TBSCertificate from the X.509 certificate (after verifying the signature). The log operator then builds the TimestampedCertificateEntryDataV2 and puts it into a TransItem of type x509_entry_v2. This TransItem will eventually be inserted into the log (as one of the Merkle Tree leaves).

The log operator the creates a “Signed Certificate Timestamp (SCT)”: It signs the TransItem and puts the signature into the SignedCertificateTimestampDataV2. The timestamp and the sct_extensions are copied over from the TimestampedCertificateEntryDataV2. With this signature a log operator promises to include the corresponding TimestampedCertificateEntryDataV2 in the log.

Note that the SCT does not contain the certificate. Thus the SCT is only useful in combination with a certificate. Also, the SCT is not included in the log, only the certificate.

The log operator then returns the SCT to the CA.

Step 4: X.509 Certificate again

Chrome and Safari require CT and will only accept TLS connections with certificates that are included in two or more CT logs. Firefox does not enforce CT (at the time of writing in July 2023).

The browsers enforce this by verifying the >= 2 SCTs together with the certificate. There are three ways for the webserver to send the SCT to the browser: include the SCT directly in the certificate, or serve the SCT over OCSP, or as a TLS extension (see Section 6 of RFC 9162).

If the SCT is directly embedded into the certificate, the flow is slightly different: the CA does not submit a X.509 certificate to the log but instead it submits a precertificate. When the CA gets the SCT from the log operator, includes the SCT in the extensions field of the X.509 certificate. Only then the CA signs and issues the X.509 certificate.

Finally, the CA returns the X.509 certificate and the SCT to the subject.

Conclusion

This post gave a brief overview of the process of creating certificates and the data structures that the certificate data flows through.

We started with a Certificate Request, then issued an X.509 certificate, and then also obtained an SCT.

If you want to dive deeper read the linked RFCs.

If you want to inspect PEM/DER-encoded ASN.1 objects (e.g. a CSR) you can use openssl asn1parse. Alternatively der2ascii (Github) provides a nice hierarchical view.

The TBSCertificate.signature field is redundant. Also yes, it should really be called TBSCertificate.signatureAlgorithm… ↩︎