Last updated at 1:10 pm UTC on 19 September 2012
Table of contents
GOODS is a distributed, language-neutral object database from Konstantin Knizhnik with a very liberal license. It is available at http://www.garret.ru/~knizhnik/goods.html, along with client interfaces for C++, Java, and Perl. The GOODS distribution includes documentation on configuring and maintaining a GOODS server (hot backups etc) and GOODS license information.
Avi Bryant developed a Squeak client for GOODS that allows either transparent storage of Smalltalk objects, or, given some Extra GOODS Type Information, compatibility with the GOODS Java interface storage conventions, for easy sharing of object data between Squeak and Java.
This client has been ported to Visualworks Smalltalk and is now actively maintained for Pharo, Squeak and VisualWorks by David Shaffer with contributions from the community.
Install the GOODS database server appropriate for your platform (http://www.garret.ru/~knizhnik/goods.html). Install the Squeak GOODS client by evaluating:
or, for Gofer users:
VisualWorks users should load the GOODS bundle from the Cincom Public StORE Respository.
Configuring and starting the database server
Configuration of a GOODS can be as simple as a two-line file indicating the TCP port the server should listen on. For example, the file test.cfg:
indicates that there will be a single storage server running on port 2020. Further configuration and tuning of a GOODS server is possible although usually not necessary. See the GOODS documentation for details. Start the GOODS server by entering:
(note the cfg file extension must be omitted here) in a shell. You will now be in the GOODS server console. It is possible to monitor clients, perform backups and various administrative tasks from this console. Enter "help" will give you some flavor of the commands available. Note: Exiting the console will terminate the server. This is probably fine for development but in production environments I use the server.admin_telnet_port option which causes the admin console to be available via TCP/IP. See the GOODS documentation for details.
The examples in the rest of this documentation assume that your server is running on port 2020.
These instructions assume you have goodsrv running on port 2020 as described above.
The only class you should have to deal with directly is KKDatabase. Create it with the onHost:port: class side method:
db := KKDatabase onHost: 'localhost' port: 2020.
When you first open a new database, you need to set the root object:
db root: Dictionary new.
After that, you can always access that object through #root. For example, you can open a second session:
db2 := KKDatabase onHost: 'localhost' port: 2020.
GOODS uses "persistence by reachability" - any object that can be accessed from the root will get stored in the database. For example:
x := OrderedCollection new.
y := 'hello world'.
db2 root at: 'test' put: x.
"the collection referenced by x is now in the db".
x add: y.
"now the string referenced by y is too".
Whenever you commit, your local changes are sent to the database, and any remote changes will be sent to your image. If you just want to update your image with the remote changes, use #refresh.
db root. "won't include the collection added to db2"
db root. "now it will"
If two clients change the same object at the same time, you can get a commit conflict. This will raise a KKCommitFailure exception. Possible actions you might want to take are #refresh (which will throw away your conflicting changes, but keep any others), or #rollback (which will throw away all your changes since the last commit). You can then try making your changes and committing again:
(db2 root at: 'test') add: 5.
[(db root at: 'test') add: 42.
do: [:ex | db rollback. ex retry]
A shorthand for this is:
db commitWithRetry: [(db root at: 'test') add: 42].
Another way to avoid conflicts is with object locking. See the "locking" protocol of KKDatabase for details - there are blocking and non blocking methods for shared (read) and exclusive locks.
Common database (KKDatabase) operations
root – answers the root object in the database
root: newRoot – sets the root object for the database to newRoot. Generally only performed once.
commit – send all modified persistent objects to the GOODS server. May result in a KKCommitFailure if some other session has modified the same object(s). A successful commit is automatically followed by a refresh (see below).
commitWithRetry: aBlock – Start with a refreshed database (see #refresh). Evaluate aBlock. Commit and if the commit fails, rollback and repeat.
rollback – return all modified persistent objects to the state they were in when you last loaded them from the GOODS server. Rollback is automatically followed by a refresh (see below).
refresh – as concurrent database sessions commit modifications to objects, the GOODS server sends change notifications to any sessions that share these objects. The refresh operation instructs the client to load the latest version of all objects for which it has received change notification. This is done to provide a "fresh" view of the database without performing a commit. If this session has modified, but not committed, any objects that were modified by other sessions this session's changes will be lost. Normally refresh is only used for read-only sessions.
flush – over time the session caches can hold references to objects that may no longer be reachable from the database root or the user's application code. flush removes those objects from the caches by temporarily switching to only weak references to them, performing a garbage collection, and then restoring the strong references to the objects that remain.
flushAll – like flush but uses a more aggressive garbage collection.
The GOODS client is designed to provide fairly transparent persistence to your objects. If only one database connection, or "session", is maintained, then one can picture the GOODS client as simply ensuring that any changes made to any object reachable from the root object are preserved after a commit. Normally, however, many sessions are manipulating data in the GOODS database simultaneously. For example, in a Smalltalk application server (Seaside, Iliad etc), each user session is normally associated with a database session. This gives each user session its own consistent "view" of the persistent objects. Also, it might be possible that multiple Smalltalk images are connecting to the database simultaneously. When multiple overlapping (in time) sessions make changes to their view of objects in the database, there must be a mechanism to ensure that the changes are applied correctly (or at least the sessions are informed of conflicts when they occur). This section clarifies the programming model that one uses with the GOODS client and discusses models that this client supports for ensuring session changes are properly serialized.
GOODS = object storage
The primary job of the GOODS server is object storage and retrieval. Objects are given an identity via the object ID (OID). When a new object is stored, the client must request an unused OID for this object. When an existing object is stored, it overwrites the old version of the object with the same ID (see Concurrency discussions below). When an object is retrieved it is retrieved by OID (the client must know the OID of the object it wants to retrieve).
The root object is one of a few objects whose OID is the same for all databases. That is, the client can retrieve the root object because its OID is always 16r10000. All objects that will be fetched from the database will be discovered by traversing object references starting at this root object.
A GOODS session is a connection initiated by creation of a KKDatabase instance and terminated when this instance is garbage collected or explicitly sent the #logout message. The concept of a "session" exists on both the client and server.
Reifying an instance
Concurrency modes and conflict detection
The GOODS client supports several modes of interaction with the GOODS database server for detection and resolution of concurrency conflicts. First let me define what I mean by concurrency conflict. Suppose
Write barriers and change detection
GOODS client implementation notes
The remainder of the sections of this tutorial require some lower level details regarding GOODS and the GOODS client. These are for the reader who has begun developing an application with GOODS and discovered that, despite the transparency afforded by using GOODS, sometimes one needs to "think about the database."
Instance variables, immediate values and proxies
The GOODS client keeps a number of cache-like objects related to a session. Only a couple of these every need to be considered by the user:
Key cache –
Object reification (turning an object stored in GOODS into a like-typed Smalltalk object)
This is a stub section TBD.
In response to requests for objects, GOODS produces object records. These object records include information about the class of the object and the values of its instance variables. The reification process involves building the instance and setting its instance variables based on the data in the object record. "Immediate" values (integers, characters and such...see implementers of #goodsIsImmediateValue) will be stored in the instance variables. Reference values will be reified as KKObjectProxy instances unless the referenced object already exists in the key cache.
Object life cycle in a session
This section discusses what happens to an object when it has been loaded into a session in a Smalltalk image. This information can be important for users experiencing image memory strains or performance problems during commit or rollback operations. Here are the basic steps in the object reification process:
- An object with a given OID is requested from the session (KKDatabase>>at: normally triggered through a KKProxy)
- The key cache is consulted and, if a reified instance with this OID already exists in this session, that instance will be returned
- The record cache is consulted and, if a record of an object with this OID already exists, that record will be reified into an object, stored in the key cache and returned.
- If neither of these conditions are met, this request will be passed on to the GOODS server. GOODS responds with a description of that object's state as well as other "related" objects (GOODS speculates about what objects you might ask for next). These object descriptions, called records, are placed in the record cache keyed by OID. Of these objects, the one with the OID that you requested will be reified and placed into the key cache keyed by OID for future requests. Finally the fully reified object will be returned as the response to your request.
From the discussion above it should be clear that the requested object now lives in the key cache. In addition it is presumed that the application is holding a strong reference to it (momentarily, at least) directly or indirectly through another object. The key cache may also hold a strong reference but, depending on the Caching strategy, this reference may be weakened at any time. If the
Scan of object cache on commit
In my experience most performance problems in the Squeak/Pharo GOODS client are caused by scanning the object cache for dirty objects during a commit. Most users report this problem the first time they try to populate a GOODS database with a large number of objects desearialized from a text file or from another database. Note: This is not a problem in the VisualWorks implementation since VisualWorks supports "immutable" objects and the GOODS client uses this immutability support for change detection.
Since the Squeak VM has no support for tracking which objects have changed, during a commit the GOODS client makes a linear scan through the objects that it has loaded since the session began and compares each of these objects to a memento kept at the time the object was first loaded. Only objects that have changed are sent to GOODS to be committed to disk (see Concurrency modes for more details). If this cache is large the scan can take a long time. (Note: The VisualWorks client does not have this problem since it makes use of VM support for tracking object mutation.)
There are a number of tools to help you deal with this. First, keep your cache small by using frequent flush or flushAll calls, especially during bulk populations of a database. Always do this right after a successful commit or rollback, or there's a risk of losing your changes. Second, batch your commits, especially when doing a bulk load operation. If anyone has example code for such bulk load operations please feel free to add them here.
Another common source of user anguish is pushing/pulling large collection to/from GOODS. Many collection objects make use of Arrays or are themselves indexed objects. When such an object is loaded (or its array is loaded), GOODS must send the client the OID of every object in the collection. This can take a long time for large collections. In Squeak/Pharo, when committing, the GOODS client must scan the entire collection to see if it has changed or if any of the objects it contains has changed. This scan can take a long time. On all platforms, if the collection has changed, it will be sent in its entirety to GOODS. This can take a long time and is also a frequent source of commit conflicts (when two processes modify this array in overlapping transactions).
The solution to this is to avoid array-like containers when you are storing large numbers of objects. Crack open your Data Structures textbook to find the right storage format for your needs. BTree's work well for keyed collections, for example, and an implementation is available in squeaksource. Doubly linked lists work well for queue-like access to list structures (although they cause performance problems for linear searches). There are other list-like structures that are "segmented" so you can certainly find one to meet your needs.
One common sub-case here is storage of binary large objects (BLOBs) like images. Normally the one can improve commit performance of these objects if they are treated as "value objects". To do this, subclass Array, ByteArray or whatever and implement goodsIsImmutable to return true. Now the commit code will ignore changes to this collection in effect treating like a value. This can greatly improve commit performance in this common case. (Note: The GOODS server includes support for incrementally loading BLOBs but this is not yet supported by the Smalltalk GOODS client.)