summaryrefslogtreecommitdiff
path: root/doc/db.md
blob: 925891ebae91421a3dfedcb92e8483cc4394fc3b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# plop database

The plop database is a distributed, single-master, append-only
database suitable for transparency systems like Certificate
Transparency.

Data entries are stored together with three attributes:

- index

  integer; the first entry has index 0, the next one 1 and so on

- entry hash
  
  the hash over the entry, used for duplicate detection

- leaf hash

  hash over specific parts of the entry, usually together with a
  timestamp for use in a merkle tree

## Storage in a file system

Two files (catlfish names in parentheses):

- treesize (treesize)

  filename is static, file contains one line -- the number of entries
  in the part of the database up to and including the last entry in
  the last published tree, like a "current pointer"

- index (index)

  filename is static, file contains one line per entry -- the leafhash

Three directories, "bucketed" in three levels, one file per database
entry (catlfish names in parentheses):

- entry (certentries)

  filename=leafhash, content=the actual data of the entry

- entryhash (entryhash)

  filename=entryhash, content=leafhash

- indexforhash (certindex)

  filename=leafhash, content=index

## Distributed

TODO: describe distribution

## Erlang code in src/

- db.erl

  public interface for adding entries and getting entries by index,
  leaf hash and entry hash

- index.erl

  file-based storage for ordered append-only lists of fixed-sized
  entries, retrievable by index

- perm.erl

  reading and writing of files

- atomic.erl

  atomic file operations

- util.erl

  helper functions for lower level file handling

- fsyncport.erl

  interface to C implementation for fsync(2) syscall

## C code in c_src/

- net_read_write.c

  read and write to/from a file descriptor, using fsync(2) to increase
  probability that data lands on disk

- fsynchelper.c

  erlang port for net_read_write

- erlport.c

  glue