=== Top of the Swiki === Attachments ===
Celeste backend
This is a short intro to the mechanics of saving mail in Celeste.
Logically, Celeste remembers two kinds of information -
- All the mail messages it ever got, unchanged.
- A list of categories and for each, which messages belong in it.
Add to those an implementation detail -
- Essential details about each message that hasnt been deleted.
These are respectively kept in three files (.messages, .categories and .index). The index uses an additional file, more later.
The three types of info are represented in objects at runtime.
MailMessage holds a complete message, and may generate MIMEPart, MIMEDocument and MIMEValueHolder as needed. These only exist for resident messages.
Each category is a Set of the msgIDs of the messages in it.
A IndexFileEntry is held (in a big Dictionary with integer indexes as keys) for every mail in the system from the moment Celeste loads. These hold for each message:
- The msgID by which it is known in the categories.
- The messageFile name, the messages location in it, and the textLength. These are crucial to getting the text.
- A cache for a number of fields parsed from the message header (time from to cc subject) which are very handy in filter, for example.
- And tocLineCache which is a speed-tweak for the GUI.
Celeste uses files in two common ways - load and overwrite, and append difference.
An example of the first method, the categories file is loaded complete on entering Celeste, and saved when it has likely changed. The good part - this means that it always shows a consistent snapshot of a specific moment. The bad parts - each load/save is pretty big and takes significant time. This has the secondary effect that we cant save it every time a single message changes categories, so if Squeak crashes, the categories file may not be up to date (no tragedy, usually). This file is binary, and relatively small (slightly >4 bytes per message), so its not too bad. This did require creating an optimized variation of PluggableSet that deals well with large SmallIntegers.
The index file also used to use the first method. This was prohibitively slow, seeing as the file holds far more information (~150 bytes per message). This comes from caching all those details, and from doing it in text, not binary, mode. Doing the save after every fetch mail operation was too slow, so the index file was given reinforcements - an append-only log file. Every new/changed index and every message removal is written to the log. Loading the index state is done by reading the index file and then applying the changes stored in the log. An integrated log file is written only when Celeste is exited, certainly a rarer occaison.
The Message file is a good example of an append-only file. New messages are simply added at the end, as are modified versions of old messages. The second means that an obsolete version of the edited message remains (a hole in the file), but thats pretty rare so who cares. A compact operation should please anyone that does...
Weaknesses of the current scheme in my mind -
- Loading and closing Celeste take lots of time because the index file is big, and those operations are still monolithic.
- Fetching mail and various operations stall while the categories file is updated.
- That messages file is a monster, and one day its gonna eat us all...
- For all this complexity, we still dont have fast searches or anything. The categories file is actually an inverted file index, but if we added a category for each word indexed it would be huge too.