About MagmaCollections
Last updated at 7:21 am UTC on 29 March 2018
Overview
Some programs must provide fast access to very large collections of objects without consuming a lot of memory. Magma can maintain and quickly "search" large, flat structures, but the normal Smalltalk collections such as Bag or OrderedCollection are not suitable for this. The contiguous ByteArray records Magma uses to store and transport Smalltalk objects would be impractical for a large Smalltalk Collection, not to mention a higher potential for contention.
Introducing MagmaCollection
Magma provides a new class for this large, flat kind of structure, called MagmaCollection and offers the following features:
- Can contain millions of objects, limited only by the available storage on the device which holds the Magma repository files.
- Provides #size and absolute-position access (at: anInteger) making it suitable for scrolling lists.
- Rapid query support across multiple indexes.
- Support for "from-key" matching – finding the next higher key when an exact key is not known.
- Several common index types, Date, DateAndTime, Keyword, Integer, and UUID, are included. By making a new subclass and overriding a few methods, applications can define custom index types to meet domain-specific requirements.
- Key-order enumeration from any point.
- Reduced-conflict, adds and removes from different sessions can occur simultaneously, without conflicting.
- Supports batch operations via slowlyDo: [ ... ].
MagmaCollections behave like a Bag in that they can hold multiple instances of the same object, and can quickly answer occurrencesOf: anObject. After adding at least one index (via addIndex:), it can be queried for matching sub-collections.
Heterogeneality
MagmaCollections themselves are heterogeneous, but all objects in the MagmaCollection must respond to all of the index attribute selectors. For example, if you wanted a heterogeneous collection of People and Organizations, adding an index on #name would require each of those classes to be able to respondTo: #name.
A convenience method is provided to allow you to check whether an object you might want to add, can be. The object must respond to the index selectors.
myMagmaCollection canAdd: myObject
Creating a MagmaCollection
Creating a MagmaCollection is similar to creating many other kinds of objects.
MagmaCollection new
Despite its "size" and special nature, it is just another domain object. To make it persistent, simply reference it from another persistent object and commit. The special support files required to support the collection will be created automatically on the server.
Persistent nature
MagmaCollections only maintain a "page" of objects at a time in the client image. Offering reduced-concurrency, objects added to a MagmaCollection by other users will be available upon the next page-retrieval, which can occur many times between transaction boundaries. The objects read from the collection, themselves, will not change state until crossing a transaction boundary.
Adding and removing objects
Adding and removing objects matches the Collection-API. To add:
mySession commit: [ myMagmaCollection add: myObject ]
to remove it:
mySession commit: [ myMagmaCollection remove: myObject ]
Indexes
Initially, the collection is not indexed. Without indexes, a MagmaCollection is limited in its ability to access the objects it references. You can test includes: and occurrencesOf: anObject, but to actually get at specific elements, you must add an index.
myMagmaCollection addIndex:
(MaAsciiStringIndex
attribute: #bookTitle
keySize: 64)
Magma defines several working, basic index types. MaAsciiStringIndex (useful for indexing proper nouns), and MaSearchStringIndex for a more forgiving, case-insensitive index. There are also index types for Dates, DateAndTimes, UUID's, Integers, and more.
Depending on the keySize you specify, Magma's "String" indexes are sensitive to the first few characters:
type | bits | number of sensitive characters |
MaAsciiStringIndex | 64 | 9 |
MaAsciiStringIndex | 128 | 18 |
MaSearchStringIndex | 64 | 10 |
MaSearchStringIndex | 128 | 21 |
They are useful for what they were intended for, but other index types will be useful and will need to be defined if your program has special needs. See Defining a new index type for more information.
Accessing elements with MagmaCollectionReader
A MagmaCollectionReader provides a "view" of the objects in MagmaCollection. These are useful for quickly obtaining subsets of the collection based on query critieria.
myReader := aMagmaCollection where:
[ :reader |
reader
read: #lastName
from: 'Jackson'
to: 'Muller' ]
This will answer a MagmaCollectionReader with all objects whose #lastName >= 'Jackson' and <= 'Muller'. It knows the size and can access by absolute integer position.
For more information, see Magma Queries.
Optimizing read performance with pageSize:
Internally, the reader maintains only a "page" of objects from the collection in memory. When an application accesses outside the range of the page, the reader automatically retrieves a new page from the server. To optimize performance, you may customize the number of objects in memory at once with the #pageSize: attribute.
myReader pageSize: 50 "retrieve up to 50 objects at a time"
Batch operations
At some point, it may be necessary to enumerate an entire MagmaCollection. Because of their large size, this can take a long time, so enumeration is normally part of a utility script rather than end-user application.
Most batch scripts will be concerned with reaching every object in the collection, which requires the collection to be locked. Attempting to commit an add or remove to the collection while it's locked will result in a MagmaCommitConflictError being signaled.
Although MagmaCollection supports a compatible API with Collection (do:, select:, reject:, etc.), using the utilitarian #slowlyDo:commitEvery: message allows the "senders" operation of Smalltalk IDE's to easily find relevant senders.
myMagmaCollection
slowlyDo: [ :each | each doSomething ]
commitEvery: 1000 "commit every 1000 objects"
Support Files
When a new MagmaCollection is created, or a new index is added, an additional file will be automatically created on the server upon commit. The name of the file for the collection is its oid, followed by '.hdx' as the extension. hdx stands for "hash index," the file structure used to support these large collections. The hdx file for each added index are named by the selector and the oid of its collection. The Magma server maintains these files internally.
How they work
The key to MagmaCollections and their indexes is a file structure analogous to a Dictionary of Bags, implemented by the class MaHashIndex.
A MaHashIndex provides an interface to a file that:
- represents 0 as the lowest possible key.
- represents the highest possible key according to the keySize of the index (i.e., any of 16 to: 4096 by: 8).
- associates every key to a value. For Magma, the value is the oid (which identifies the object).
- can find any key or next-higher key at O(log(n)) rate.
- provides enumeration from any absolute position, or from any key position, OR from any relative-position WITHIN a key-range.
- handles insertion and deletion of keys and the associated space-organization dynamically.
- allows fine-tuning record sizes to optimize for the different key-dispersions of various kinds of indexes.
Index updating
Changing the indexed attribute of an object requires special consideration. This is done simply by telling the Session object to #noteOldKeysFor: theObject whose indexed attribute is changing. Typically this can be handled in the setter for that attribute. So, if a collection has objects indexed by their #date then, in the setter, it is needed to signal to your MagmaSession to note the old value for a particular index value before you change it.
date: aDate
MagmaSessionRequest signalNoteOldKeysFor: self.
MagmaSessionRequest signalCommit: [ date := aDate ]