Celeste's Mail Database
Last updated at 1:35 pm UTC on 16 January 2006
This is a short intro to the way Celeste stores its mail. See Celeste's Mail Database for ways it might be improved. Ultimately, read the code to see the nitty gritty – this document is intended to provide an overview.
MailDB is the front-end interface to a Celeste database (and it could certainly be accessed outside of Celeste!). Some of the key messages it responds to are:
- openOn: (class method) – open a mail database
Behind the scenes, there are three files, each with an associated subclass of MailFile (XXX check this name):
- messages file (class MessagesFile). This holds the contents of each message in the database.
- index file (class IndexFile). This holds some key information about each message in the file for quick access.
- categories file (class CategoriesFile). This holds keeps track of which messages are in which user-defined categories.
The messages file has an extension of ".messages" and is accessed via class MessagesFile.
Individual messages are accessed via the MailMessage class.
The format of the messages file is simple. Each message starts with the string "&&&&&start" or "&&&^amp;^amp;XXXXX", depending on whether the message is virtually alive or not. This string is followed by an ASCII base-10 number, which is the message's ID. A message ends when the next &&&&& string is seen, or at the end of the file.
Currently, no provision is made to handle messages which include the string &&&&& in their body.
The index file has extension ".index" and is accessed via class IndexFile. The index file stores, for each message in the database:
- its message-id
- its From line
- its To line
- its CC line
- its subject line
- its date, expressed as a number of seconds since some epoch (someone want to fill this in?)
- the file position of the message within the messages file
(XXX I'm not 100% sure about this order -lex)
In code, an individual entry is accessed via class IndexFileEntry.
The format of the index file is simply each of these data items on a line to itself, one entry after another. For speed, the list is usually stored sorted by time.
As an optimization, there is also a journal kept with the extension ".log". For small updates of the index file, the entire file is not rewritten, but updates are written to the log.
The categories file has extension ".categories" and is accessed via class CategoriesFile. It holds a list of user-defined categories, and a list of message id's that appear within each category.
The format is a sequence of category descriptions, each in this format:
- the name of the category, stored with nextStringPut:
- the number of elements in the category, stored as a 32-bit big-endian integer
- the message-id's of the messages in the category, stored as a sequence of 32-bit big-endian integers.
General Design Notes
The message file is designed for robustness: no matter what else might go wrong, your messages will not be deleted. Even if the user requests that messages be deleted from the mail database (via the "empty trash" menu item), there is simply a note placed in the messages file that the messages is virtually deleted.
The one exception is the compact operation. Compaction rewrites a message file, removing all deleted messages from it. Thus, compaction should be run with care! If you have the disk space for it, you may simply prefer to never compact your mail database!
The index file and categories file, however, are rewritten at every save (module the .log optimization with the index file). This is dangerous, and in the case of the index file, it is fairly slow.
Note that the index file can be recovered completely by scanning the message file, and indeed the "compact" operation does recreate the index file from scratch. The categories file, however, is user-defined data that appears nowhere else.