6 minutes
How Certificates are Born (Data Structure Edition)
Authentic key distribution is a fundamental problem of public key cryptography: how do you know that a public key really belongs an entity (the “subject”)? A MITM attacker could have intercepted and changed the public key! In other words, we need an authentic public-key-to-identity binding.
One solution is to meet up in person and verify their public key directly. This is what Signal and WhatsApp do with their Safety Numbers.
But this doesn’t scale. How would you meet up with Google to verify their website’s public key? This is where certificates come in.
A Certificate Authority (CA) verifies the identity (“thore.io”) and that this identity controls a keypair (pk, sk). The CA then signs a certificate that basically states “I, the CA, certify that the secret key belonging to the public key pk is controlled by the entity thore.io”. Now anybody who trusts that the CA is honest and that it correctly verifies identifies before issuing certificates can also trust that this public-key-to-identity binding is correct.
But what is the flow of creating certificates? What are the data structures?
Certificates are often seen as this big complex beast. And correctly so! The many extensions and options mean that it gets complicated fast. But the core data structures are not that complicated.
This post tries to break down these data structures. Using the data structures to guide us, we step through the process of creating a certificate.
Step 1: Certificate Request
It starts with what is colloquially known as “Certificate Signing Request”. It usually uses the following PKCS#10 data structures (defined in RFC 2986).
PKCS#10 looks like this:
CertificationRequestInfo ::= SEQUENCE {
version INTEGER { v1(0) } (v1,...),
subject Name,
subjectPKInfo SubjectPublicKeyInfo{{ PKInfoAlgorithms }},
attributes [0] Attributes{{ CRIAttributes }}
}
CertificationRequest ::= SEQUENCE {
certificationRequestInfo CertificationRequestInfo,
signatureAlgorithm AlgorithmIdentifier{{ SignatureAlgorithms }},
signature BIT STRING
}
If subject S has a keypair and wants to obtain a certificate,
it first creates a CertificationRequestInfo
and fills out
the subject name (“thore.io”) and the details of their public key.
Then S signs the CertificationRequestInfo
and together with the signature
puts it into a CertificationRequest
.
Step 2: X.509 Certificate
S then sends the CertificationRequest
to a CA of their choice.
Now the CA needs to verify that the subject S really controls the public key S wants to bind to its identity. It can e.g. do this using ACME (defined in RFC 8555).
When the CA is satisfied it issues the certificate. Certificates generally use the X.509 format (defined in RFC 5280).
X.509 certificates look like this:
TBSCertificate ::= SEQUENCE {
version [0] EXPLICIT Version DEFAULT v1,
serialNumber CertificateSerialNumber,
signature AlgorithmIdentifier,
issuer Name,
validity Validity,
subject Name,
subjectPublicKeyInfo SubjectPublicKeyInfo,
issuerUniqueID [1] IMPLICIT UniqueIdentifier OPTIONAL,
subjectUniqueID [2] IMPLICIT UniqueIdentifier OPTIONAL,
extensions [3] EXPLICIT Extensions OPTIONAL
}
Certificate ::= SEQUENCE {
tbsCertificate TBSCertificate,
signatureAlgorithm AlgorithmIdentifier,
signatureValue BIT STRING
}
The CA first fills out the TBSCertificate
(“to-be-signed” certificate).
It copies over the values from the CertificationRequest
,
assigns a new serial number, chooses a validity period,
and adds any extensions that it wants.1
The CA then signs the TBSCertificate
and puts everything together into the Certificate
.
We now have an X.509 certificate!
Step 3: Signed Certificate Timestamp
Back in the day, we would now be done. The CA returns the X.509 certificate to the requester subject S who can use it to prove their identity (e.g. in a TLS handshake).
However, after a series of CAs misbehaving and wrongly issuing certificates Google introduced Certificate Transparency (CT). With CT the CAs need to insert every certificate that they issue into a public append-only log. For efficiency, this log is a Merkle Tree with the leaves being the log entries.
CT is defined in RFC 9162. The relevant data structures are:
struct {
VersionedTransType versioned_type;
select (versioned_type) {
case x509_entry_v2: TimestampedCertificateEntryDataV2;
case precert_entry_v2: TimestampedCertificateEntryDataV2;
case x509_sct_v2: SignedCertificateTimestampDataV2;
case precert_sct_v2: SignedCertificateTimestampDataV2;
case signed_tree_head_v2: SignedTreeHeadDataV2;
case consistency_proof_v2: ConsistencyProofDataV2;
case inclusion_proof_v2: InclusionProofDataV2;
} data;
} TransItem;
opaque TBSCertificate<1..2^24-1>;
struct {
uint64 timestamp;
opaque issuer_key_hash<32..2^8-1>;
TBSCertificate tbs_certificate;
Extension sct_extensions<0..2^16-1>;
} TimestampedCertificateEntryDataV2;
struct {
LogID log_id;
uint64 timestamp;
Extension sct_extensions<0..2^16-1>;
opaque signature<1..2^16-1>;
} SignedCertificateTimestampDataV2;
First the CA sends the X.509 certificate to a CT log operator. (Alternatively the CA can also send a “precertificate”. This signals the CA’s binding intent to later issue a certificate.)
The log operator extracts the TBSCertificate
from the X.509 certificate (after verifying the signature).
The log operator then builds the TimestampedCertificateEntryDataV2
and puts it into a TransItem
of type x509_entry_v2
.
This TransItem
will eventually be inserted into the log (as one of the Merkle Tree leaves).
The log operator the creates a “Signed Certificate Timestamp (SCT)”:
It signs the TransItem
and puts the signature into the SignedCertificateTimestampDataV2
.
The timestamp
and the sct_extensions
are copied over from the TimestampedCertificateEntryDataV2
.
With this signature a log operator promises to include the corresponding TimestampedCertificateEntryDataV2
in the log.
Note that the SCT does not contain the certificate. Thus the SCT is only useful in combination with a certificate. Also, the SCT is not included in the log, only the certificate.
The log operator then returns the SCT to the CA.
Step 4: X.509 Certificate again
Chrome and Safari require CT and will only accept TLS connections with certificates that are included in two or more CT logs. Firefox does not enforce CT (at the time of writing in July 2023).
The browsers enforce this by verifying the >= 2 SCTs together with the certificate. There are three ways for the webserver to send the SCT to the browser: include the SCT directly in the certificate, or serve the SCT over OCSP, or as a TLS extension (see Section 6 of RFC 9162).
If the SCT is directly embedded into the certificate, the flow is slightly different:
the CA does not submit a X.509 certificate to the log but instead it submits a precertificate.
When the CA gets the SCT from the log operator,
includes the SCT in the extensions
field of the X.509 certificate.
Only then the CA signs and issues the X.509 certificate.
Finally, the CA returns the X.509 certificate and the SCT to the subject.
Conclusion
This post gave a brief overview of the process of creating certificates and the data structures that the certificate data flows through.
We started with a Certificate Request, then issued an X.509 certificate, and then also obtained an SCT.
If you want to dive deeper read the linked RFCs.
If you want to inspect PEM/DER-encoded ASN.1 objects (e.g. a CSR) you can use openssl asn1parse
.
Alternatively der2ascii
(Github) provides a nice hierarchical view.
The
TBSCertificate.signature
field is redundant. Also yes, it should really be calledTBSCertificate.signatureAlgorithm
… ↩︎
1098 Words
2023-07-28 13:00 +0000