From 9a8f8bd8da784622fc340c8466435551a2d2d268 Mon Sep 17 00:00:00 2001
From: Magnus Ahltorp <map@kth.se>
Date: Fri, 18 Mar 2016 16:04:40 +0100
Subject: Added description of current merge implementation

---
 doc/merge.txt | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

(limited to 'doc/merge.txt')

diff --git a/doc/merge.txt b/doc/merge.txt
index 28757a7..b2e2738 100644
--- a/doc/merge.txt
+++ b/doc/merge.txt
@@ -20,6 +20,66 @@ The merge process
 
 - merge-dist distributes 'sth' and missing entries to frontend nodes.
 
+Merge distribution (merge_dist)
+-----------------------------------------------------
+
+ * get current position from frontend server (curpos)
+
+ * send log
+   * sends log in chunks of 1000 hashes from curpos
+
+ * get missing entries
+   * server goes through all hashes from curpos and checks if they are
+     present
+   * when the server has collected 100000 non-present entries, it
+     returns them
+   * server also keep a separate (in-memory) counter that caches the
+     index of the first entry that either hasn't been checked if it is
+     present or not, or that is checked and found to be non-present,
+     to allow the server to start from that position
+
+ * send entries
+   * send these entries one at a time
+   * does not get more missing entries when it is done
+
+ * send sth
+   * sends the previously (merge-sth) constructed sth to the server,
+     which verifies all entries and adds entry-to-hash and
+     hash-to-index
+   * saves the last verified position continuously to avoid doing the
+     work again if the verification is aborted and restarted
+
+Merge backup (merge_backup)
+-----------------------------------------------------
+
+ * get verifiedsize from backup server
+
+ * send log:
+   * determines the end of the log by trying to send small chunks of
+     the log hashes from verifiedsize until it fails, then restarts
+     with the normal chunk size (1000)
+
+ * get missing entries
+   * this stage is the same as for merge_dist
+
+ * send entries
+   * send these entries in chunks of 100 at a time (this is limited
+     because of memory considerations and web server limits)
+   * when it is done, goes back to the "get missing entries" stage,
+     until there are no more missing entries
+
+ * verifyroot
+   * server verifies all entries from verifiedsize, and then
+     calculates and returns root hash
+   * unlike merge distribution, does not save the last verified
+     position either continuously or when it is finished, which means
+     that it then has to verify all entries again if it is aborted and
+     restarted before verifiedsize is set to the new value
+
+ * if merge_backup sees that the root hash is correct, it sets
+   verifiedsize on backup server
+
+
 TODO
 ====
 
-- 
cgit v1.1