From 4a86c60c415cc9b0790bb6bf1a8dcac6f802cae8 Mon Sep 17 00:00:00 2001 From: Magnus Ahltorp Date: Thu, 11 Sep 2014 17:04:23 +0200 Subject: New architecture --- doc/design.txt | 235 ++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 157 insertions(+), 78 deletions(-) (limited to 'doc/design.txt') diff --git a/doc/design.txt b/doc/design.txt index 29ca0a4..0a57e70 100644 --- a/doc/design.txt +++ b/doc/design.txt @@ -1,89 +1,168 @@ -catlfish design (in Emacs -*- org -*- mode) +-*- markdown -*- + +Overview +======== This document describes the design of catlfish, an implementation of a Certificate Transparency (RFC6962) log server. -We have -- a primary database storing x509 certificate chains [replicating r/o - copies to a number of frontend nodes?] -- a hash tree kept in RAM -- one secondary database per frontend node, storing the most recently - submitted data -- a cluster of backend nodes with an elected leader which periodically - updates the primary db with data from the secondary db's -- a number of frontend nodes accepting http requests, updating - secondary db's and reading from local r/o copy of the primary db -- a private key used for signing SCT's and STH's, kept (in HSM:s) on - backend nodes - -Backend nodes -- are either asleep, functioning as storage only -or -- store submitted cert chains in persistent media -- have write access to the primary database holding cert chains -- periodically append new cert chains to the hash tree and sign the - tree head - -Frontend nodes -- reply to the http requests specified in RFC 6962 -- write submitted cert chains to their own, single master, secondary - database -- have read access to (a local copy of) the primary database -- defer signing of SCT's (and STH's) to backend nodes - -The primary database -- stores cert chains and their corresponding SCT's -- is indexed on a running integer (primary) and a hash of the cert - chain (secondary) -- runs on backend nodes -- is persistently stored on disk on several other backend nodes in - separate data centers -- grows with 5 GB per year, based on 5,000 3 kB submissions per day -- max size is 300 GB, based on 100e6 certificates - -The secondary databases -- store cert chains, unordered, between hash tree generation -- run on frontend nodes -- are persistently stored on disk on several other frontend nodes -- are typically kept in RAM too -- max size is around 128 MB, based on 10 submissions (รก 3 kB) per - second for an hour - -Scaling, performance, estimates -- submissions: less than 0.1 qps, based on 5,000 submissions per day -- monitors: 6 qps, based on 100 monitors -- auditors: 8,000 qps, based on 2.5e9 browsers visiting 100 sites + + + +------------------------------------------------+ + | front end nodes | + +------------------------------------------------+ + ^ | | + | v v + | +---------------+ +---------------+ + | | storage nodes | | signing nodes | + | +---------------+ +---------------+ + | ^ ^ + | | | + +------------------------------------------------+ + | merge master | + +------------------------------------------------+ + ^ | + | v + | +----------------------------------+ + | | merge slaves | + | +----------------------------------+ + | ^ + | | + +-------------------+ + | merge-repair node | + +-------------------+ + + + +Design assumptions +------------------ +* The database grows with 5 GB per year, based on 5,000 3 kB + submissions per day +* Max size is 300 GB, based on 100e6 certificates +* submissions: less than 0.1 qps, based on 5,000 submissions per day +* monitors: 6 qps, based on 100 monitors +* auditors: 8,000 qps, based on 2.5e9 browsers visiting 100 sites (with a 1y certificate) per month (assuming a single combined request for doing get-sth + get-sth-consistency + get-proof-by-hash) + Open questions -- What's a good MMD? Google seems to sign a new tree after 60-90 +-------------- +* What's a good MMD? Google seems to sign a new tree after 60-90 minutes (early 2014). They don't promise an MMD but aim to sign at least once a day. -A picture - -+-----------------------------------------------+ -| front end nodes | -+-----------------------------------------------+ - ^ ^ ^ ^ - | | | | - | v | | - | short term long term | - | cert db cert db copy | - | ^ | - | | v -+-----------------------------------------------+ -| tree makers | mergers | signers | -+-----------------------------------------------+ - ^ ^ - \ | - \ v - ------------- long term - cert db - -[TODO: Update terms in text or picture so they match: -secondary database == short term cert db -primary database == long term cert db -backend nodes == box with tree makers, mergers and signers] -[TODO: Move the picture to the top of the document.] + +Terminology +=========== + +CC = Certificate Chain +CT log = Certificate Transparency log + +Front-end node +============== + +* Handles all http requests. +* Has a complete copy of the published data locally. +* Read requests are answered directly by reading local files + and calculating the answers. +* Add requests are validated and then sent to all storage + nodes. At the same time, a signing request is sent to one or + more of the signing nodes. When responses have been received + from a predetermined number of storage nodes and one signing + response has been received, a response is sent to the client. +* Has an inward-facing API with the entry points SendLog(Hashes), + MissingEntries() (returns a list of hashes), SendEntry(Entry), + SendSTH(STH), CurrentPosition(). + + +Storage node +============ + +* Stores certificate chains and SCTs. +* Has a write API SendEntry(Entry) that stores the certificate chain + in a database, indexed by its hash. Then stores the hash in a list + NewEntries. +* Takes reasonable measures to ensure that data is in permanent + storage before sending a response. +* When seeing a new STH, moves the variable start to the index of the + first unpublished hash. +* Has a read API FetchNewEntries() which returns + NewEntries[start...length(NewEntries)-1]. + + +Signing node +============ + +* Has the signing key for the log. + + +Merging node +============ + +* The master is determined by configuration. +* The other merging nodes are called "slaves". +* The master has two phases, merging and distributing. + +Merging (master) +---------------- + +* Fetches CCs by calling FetchNewEntries() on storage node i + where i = 0...(n-1) +* Determines the order of the new entries in the CT log. +* Sends the entries to the slaves. +* Calculates the tree head and asks a signing node to sign it. +* When a majority of the slaves have acknowledged the entries, + compares the calculated tree head to the tree heads of the slaves. + If they match, considers the additions to the CT log final and + begins the distributing phase. + +Merging (slave) +--------------- + +* Receives entries from the master. The node must be certain + that the request comes from the current master, and not + an old one. +* Takes reasonable measures to ensure that data is in + permanent storage. +* Calculates the new tree head and returns it to the master. + +Distributing +------------ + +* Performs the following steps for all front-end nodes: + * Fetches curpos by calling CurrentPosition(). + * Calls SendLog() with the hashes of CCs from curpos to newpos. + * Fetches missing_entries by calling MissingEntries(), a list + of hashes for the CCs that the front-end nodes does not + have. + * For each hash in missing_entries, upload the CC by calling + SendEntry(CC). + * Send the STH with the SendSTH(STH) call. + + +Merge-repair node +================= + +* There is only one of these nodes. +* When this node detects that an STH has not been published + in t seconds, it begins the automatic repair process. + +Automatic repair process +------------------------ + +* Turn off all reachable merge nodes. +* If a majority of the merge nodes cannot be reached, + die and report. +* Fetch the CT log order from the merge nodes. +* Determine the latest version of the log. +* Select a new master. +* Change the configuration of the merge nodes so that + they know who the new master is. +* Start all merge nodes. +* If any of these steps fail, die and report. +* If all steps succeed, die and report anyway. The automatic + repair process must not be restarted without manual + intervention. + + -- cgit v1.1