Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
Chunky Squeak
Last updated at 11:23 pm UTC on 20 February 2009
This is the initial draft of a technical proposal for a change in the memory system of the Squeak VM. The idea is that breaking up the image into large grained "chunks" will take less effort than full modularization while giving us some of its advantages with minimal changes in current community practices. The goal is not to eliminate the need for a full modularization, but to offer an incremental step that will actually make that easier to achieve. In particular, any of the proposals for New Modules should be fully compatible with what is described below.

Chunks


A typical use case supposes that some computer might have several Squeak images in its disk, and sometimes even in its memory. There are a lot of repeated bits among these images and it would be nice to factor these out into shared "chunks". This is similar in spirit to the factoring of many .changes files into a common .sources to save disk.

As run time structures, these chunks would be similar to the project (.pr) files and ImageSegments, but they would be far less flexible. Each chunk would have a global identifier which would be a hash of its contents. So you can never change a chunk but only create new ones. Each chunk also includes the identifier for some other chunk that must be loaded into memory before it can be too, though that identifier can be zero for chunks that should be loaded into a completely empty memory (we can call these "base chunks"). Note that this single "parent identifier" is sufficient to completely specify every single bit of memory before loading the chunk, so this load always happens in a totally controlled environment (unlike ImageSegments, which must adapt to different conditions).

The bulk of the contents of a chunk indicate how memory should be changed after it is loaded. A rather simple way of doing that would be to encode the XOR of the memory before and after the chunk and then to compress the result using some standard algorithm. More efficient schemes can easily be devised.

chunky1.png

Though chunks are stored as files, the names of these files should not be taken into account by Chunky Squeak. Only the identifier included in the first few bytes (it could also be recalculated on demand from the rest of the contents) matter. Some mechanism to translate identifiers to local file names or URLs must exist, of course. But it shouldn't be hardwired into the design for Chunky Squeak.

Multiple Images in Memory


A chunk can be as small or as a large as needed. In particular, a chunk can be exactly the same thing as a current image. In that case it would have everything needed and no parent chunk, so just loading it into an empty memory would make everything work as it does now. In the case of smaller chunks, you have one that represents an end-user project and to load that one you have to previously load its parent, and the grand parent before that all the way up to a base chunk. After loading, however, everything would work exactly the same as it does now. The exception is that "save image" would generate a new chunk which would normally be pretty small compared to a typical image file.

A more interesting case is when more than one image is loaded into memory. In that case it might be common for a given chunk to be already present in memory when it is supposed to be loaded. If each image is loaded into its own address space, then page sharing can be used to save duplication. And "copy on write" can deal with the fact that later chunks might want to make (possibly incompatible) changes to a given page.

chunky2.png

An alternative is to load all the chunks into a single address space and run the images as separate threads (like in the Hydra VM) rather than separate tasks/processes. This will greatly reduce the switching overhead but will require replacing the direct pointers that have been a traditional feature of Squeak with old style object tables. These object tables will both allow the "shared but differently patched" parent chunks and will compensate for the different object addresses for each run.

This strict tree structure is very limiting. If you want to use together packages which exist in non related trunks, you have to use traditional alternatives (fileOut/fileIn, Monticello, project files, etc) to move the needed code and objects around. An alternative is to develop a different style of using Squeak where each piece can remain in its own image and yet can be combined into a common result. This would need something like the far references in Islands or in the wormholes in Spoon. An option would be for each image to "think" it had the system screen all to itself but then have its graphical output diverted to a master "gui image" that would be the only one really interacting with the user. Projects such as Nebraska probably already include most of the needed code.

Chunk Operations


The most basic chunk operation is "save image", as mentioned above. Many options are possible:

Other interesting operations might allow us to "reparent" a chunk or to split one into several. These would be needed when moving from the current image system to Chunky Squeak. In this case it would be important to have a good memory visualization tool.

Goals


Technical Problems