MailDB Ideas


	links to this page:

MailDB Ideas

Last updated at 5:25 pm UTC on 8 May 2017

2017

2001

There was recently (early October 2001) a big discussion on the list for ideas about the MailDB. Big ideas are: using a binary format, mbaking it possible to add new header fields, and having a default to go to the mesasges file if a desired header is missing. An issue under consideration is whether ImageSegments will work for the index file. Lex proposes storing the database one header field at a time, instead of one index entry at a time, to allow for really large bulk operations – maybe it helps, maybe not!

A big contention is whether the index should be complete. A complete index allows for searches over the entire database, but is wasted if you don't do that, and will take up a lot more space and load/save time than a cache of 1000 or so entries.

Things Id like to try to improve the DB:

Refactor: use a strategy to do actual writing/reading to disk, decoupling the format (esp. the index format) from everything else.
Use a binary format for the index.
Use leaner variation for the index - only file offsets, only offsets and date.
Try keeping the main index (not log) compressed, might save enough on i/o to be worth it.
Maybe the parsed-out index info is worth keeping cached on file? In that case, have the real index hold an offset to that too. Consider whether to keep all the index entries or just the MRU 50.

Celeste category changes should be logged like messages and the index.
Consider compressing the categories for speed, too.

Read everything lazily - read only a list of categories when opening Celeste.

On Entering a specific category, read its index entries. Now this would mean reading all the entries just to tell them apart, but with fixed size index entries, they could be kept on direct access file, just remap the msgIDs when consolidating the file. To display a category, you need to find the latest N message that match Test. Latest can be handled by saving them sorted or by saving the date-time. Test would be handled by reading and parsing as much info as required.

Cache message texts in memory - if a memory watcher complains, forget a few random messages.
The DB should be scalable.