Magma Queries
Last updated at 5:32 pm UTC on 8 January 2010
To access the objects in a MagmaCollection the #where: method will construct a MagmaCollectionReader on that collection. MagmaCollectionReaders are a lot like sequenceable collections themselves. They offer #size and #at:, #do:, and #sortBy:.
Constructing the query
The #where: method constructs the MagmaCollectionReader. The parameter takes a block to specify which objects to read from the receiver collection:
myReader := aMagmaCollection where:
[ : reader |
(reader
read: #date
from: '3/1/2002' asDate
to: Date today)
& (reader
read: #keywords
at: #('car')) ]
The above is the long form of building a query. There is also a short-form which bears a remarkable resemblence to standard Smalltalk select blocks:
aMagmaCollection where:
[ : each |
(each date from: '3/1/2002' asDate to: Date today)
& (each keywords at: #('car')) ]
This syntax is easier to read and consistent with standard select: blocks, but employs one of Smalltalks powerful dynamic features known as #doesNotUnderstand: to interpret the query. The consequence is any message implemented by MagmaCollectionReader (and up, through the hierarchy, to Object) cannot be used in the query expression. The following messages are the most likely possibilities of query attributes that would be affected by this in a standard image:
(Most-likely collisions from Object):
name
size
class
creationStamp
hash
value
(Most-likely collisions from MagmaCollectionReader):
expression
first
last
pageSize
Using any of these words (or any other message implemented on Object) as the name of an index requires use of the long form of querying. The short form should otherwise be fine.
Operators
The available operators are listed in the 'operators' category of MaClause:
- at: - select objects with some attribute equal to this value. Same as equals:
- between:and: - select objects with some attribute between these two values, inclusive. Same as from:to:.
- equals: - select objects with some attribute equal to this value.
- from: - select objects with some attribute greater than or equal to this value.
- from:to: - select objects with some attribute between these two values, inclusive.
- from:upTo: - select objects with some attribute greater than or equal the from value, and less than the to value.
- includesAllOf: - select objects where all values of this collection are included in the objects (i.e., keywords) attribute.
- includesAnyOf: - select objects where any value in this collection is included in the objects (i.e., keywords) attribute.
- includesAllPrefixes: - Same as includesAllOf: except allows keywords to be searched by aCollection of specified prefixes instead of requiring the client to guess entire keywords exactly.
- includesAnyPrefix: - Same as includesAnyOf: except allows keywords to be searched by aCollection of specified prefixes instead of requiring the client to guess entire keywords exactly.
- in: - select objects with some attribute value any one of the value in this collection. Same as includesAnyOf:.
- to: - select objects with some attribute less than or equal to this value.
- upTo: - select objects with some attribute less than this value.
- < - same as to:.
- <= - select objects with some attribute less than or equal this value.
- > - select objects with attribute greater than this value.
- >= - same as from:.
MagmaCollectionReader
MagmaCollectionReader offers a rich set of methods for accessing the objects inside its underlying collection. If possible, applications should try to use readers directly rather than convert them to Smalltalk collections. Great care was taken to make readers as practical as normal collections. A reader can, for example, be used directly in a scrolling list.
Unfortunately MagmaCollections cannot be queried by unindexed attributes. To do this you must convert it to a Smalltalk collection with one of the 'converting' methods.
Sorting
Magma will optimize the query to the tighest clauses automatically. If it can be optimized down to one clause, then it will be sorted by the attribute of that clause and #isSorted will answer true.
You can easily determine what clause, if any, a MagmaCollectionReader is sorted by:
myReader sortIndex
will answer the index it is optimized to, otherwise nil. If sorting on a different attribute is needed then #sortBy: may be used:
myReader sortBy: #date
which will quickly answer a new reader, but it is based on a new MagmaCollection that is being "loaded" on the server in a background process. In the meantime, this new reader may be interrogated for the results that have been sorted so far. Until #sortComplete, #fractionSorted may be used to indicate progress on the sort.
To block program progress until the sort is complete, use the past tense of sortBy:, #sortedBy:.
myReader
sortedBy: #date
makeDistinct: true
Sorting may be toggled ascending or descending with the #ascend or #descend messages.
Magma will create temporary files on the server to manage these transient sorted result sets. These files accumulate until the next compression.
Non-distinct Results
By default, an object will be included in a (MagmaCollectionReader) result once for each disjuncted (or'd) condition for which it qualifies. For example, given the following Car objects:
#year | #make | #model |
2006 | Toyota | Highlander |
1963 | Chevrolet | Corvette |
2007 | Chevrolet | Colorado |
the following query:
myCars where:
[ :eachCar |
(eachCar year > 2000)
| (eachCar make at: 'Toyota' ]
The results would be:
2006 | Toyota | Highlander |
2007 | Chevrolet | Colorado |
2006 | Toyota | Highlander |
This duplication is a feature, not a bug. Besides offering better performance, some domain models depend on knowing the "weight" or number of qualifications for each query result.
Nevertheless, a very common use for where: will be to present unique "search results". To force distinct results, use the #sortBy:makeDistinct:. Unfortunately eliminating duplicates requires a full enumeration of the result set, and creation of a new MagmaCollection containing the unique objects of the result-set. So be sure to enumerate the result of this message, not the receiver. A good pattern would be to always assume a new reader result, (even though, for fully optimizable queries (see Optimizing Performance), it will be the receiver).
The API requires specification of a sort attribute for consistency (you get back a reader, not a MagmaCollection) and simplicity (because the most efficient way to access the objects of this (or any) new result-set MagmaCollection is by way of a reader).
A MagmaCollectionReader can also be created with the #where: distinct: sortBy: descending: convenience method.
reader := myCollection
where:
[ :eachCar |
(eachCar year > 2000)
| (eachCar make at: 'Toyota' ]
distinct: true
sortBy: #model
descending: false
Beware, #where: and #where: distinct: sortBy: descending: can also be sent to MagmaCollectionReader. Therefore, one can recursively query on a collection. However when using #where: distinct: sortBy: descending:, the answered MagmaCollectionReader is associated with a newly created MagmaCollection only indexed with the sorting attribute. So subsequent query will only work with attribute use for sorting.
Optimizing Performance
Performance is optimized by not having to fault objects for evaluation into the client. Query expressions are executed on the server, leveraging MaHashIndexes to perform only integral arithmetic and comparisons.
The query algorithm tries to be as lazy as possible. Using merely an all ANDed condition will answer a reader with objects sorted on the clause with the fewest results for "free". Then, only at: will cause it to return a first requested page of results.
But the following luxury features will prevent this laziness and incur a performance cost.
feature | cost |
using an OR clause | 1 |
sorting by a different attribute | 1+2 |
requiring distinct results | 1+2 |
cost 1 is enumeration of the entire result set on the server. This is still pretty fast.
cost 2 is creation of a new indexed MagmaCollection in the background, load it with the result set. This is pretty slow.
Any program which can avoid the luxury features will reap pure lazy-retrieval of results at maximum speed.
Managing Resources
To maintain server health, it is important to send #release to a MagmaCollectionReader when the application is done with it. This is especially important for Readers based on the luxury queries.
myMagmaCollectionReader release