The Squeak Garbage Collector
Last updated at 10:54 pm UTC on 27 March 2017
Where do I find notes about the new Spur garbage collector?
The heap is the memory zone holding all objects. Because most objects have a short life-time (usually 99% of objects do not survive their first GC), the heap is split in two. In the low addresses is new space, which includes most objects recently allocated. In the upper addresses is old space, which includes long-lived objects. The GC is in most case performed only on new space as it is likely to free a lot of space very quickly. The GC is started on the whole heap only when old space grows by a significant amount.....
From an email sent by Mike Parker: (See also the "Tour of the Object Engine" by Tim Rowledge)
Dwight Hughes wrote:
For the generational GC used in Squeak, take a look at "A Real Time Garbage Collector Based On The Lifetimes of Objects" by Henry Lieberman and Carl Hewitt,Communications of the ACM, June 1983:
Forgive my ignorance on this, but I've happily been able to avoid getting to know GC up close: How is the GC of the above article different from Ungar's generational scavenger?
Lots of details that add up to a different algorithm. The L&H algorithm is based on Bakers ephemeral algorithm with the addition of generations. Both Baker and L&H use read-barriers to preserve the invariant that programs only see data in newspace. It also used forwarding pointers to ensure that references from older generations into newer generations could be relocated without mucking around inside of oldspace data. Both Baker and L&H algorithms are ephemeral, so collection and allocation were overlapped in such a way as to eliminate gc pauses entirely.
Ungar's system, while also generational, is based on Cheney's classical stop-and-copy algorithm. Like Cheney (and unlike Baker and L&H), it is a non-realtime stop-and copy algorithm. Because of this, it doesn't need the read-barrier or the specialized hardware.
Instead of using forwarding pointers to manage the forward inter generational references, it keeps a list of objects holding forward intergenerational references. These are used at eden GC's along with the other roots to copy eden to eden'.
There are lots of variations on this, including keeping the address of the reference around instead of the object holding the reference, keeping the a bit for each page holding forward references, etc. Ungar also experimented with untenuring objects as an alternative to major GCs, but that get's much trickier.
> And assuming that Squeak uses the above algorithm, why doesn't it use scavenging?
Squeak doesn't use the L&H algorithm.
It uses a generational mark-sweep-compact algorithm with only two generations (tenured and eden), and an advance-only wavefront using an Ungar-style write barrier to keep track of tenured objects holding inter-generational pointers.
Eden collections are triggered by either running out of eden space, or allocating more than a certain number of objects (currently 4000) since the last collection which helps hold down the time to GC eden. The tenuring is triggered by having more than 2000 surviving objects in eden, or by the cross-generational list getting full. When tenuring occurs, all objects in eden space are tenured at once, irregardless of age.
From skimming through the code, it looks like it's inspired by Zorn's occasionally-compacting allocator without the separate mark-bit arrays. However, the current implementation seems to always compact every minor collection.
All in all, Squeak's system isn't too bad. It's not as hip as an Ungar-style scavenger, but has the advantages of needing less virtual memory (no semispaces), as well as improving locality by maintaining the original spatial locality while compacting (a copying collector usually reduces locality a fair bit by arbitrarily rearranging memory), which matters a heck of a lot on modern architectures.
For some coding examples, see Practical wizardry, Garbage Collection and Semaphore, The Squeak Garbage Collector