Last updated at 12:35 pm UTC on 17 January 2006
I would like to propose a change to the core synchronization construct used in Squeak. I believe that we should switch from the Semaphore to the Mesa-style monitor with condition variables.
Note that I'm not just proposing that we add Mesa-style monitors to the existing bevy of synchronization constructs: I could do that without getting any consensus from the community, and indeed we have implemented Mesa monitors in terms of Semaphores. I'm proposing that we remove Semaphores entirely, and use Mesa-style monitors instead.
The basic reason for this proposal is that Semaphores are hard to use, and likely to be a source of synchronization bugs. The idea of a Semaphore with a time-out is not very well defined: what is supposed to be true after one has been released from such a Semaphore? Mesa-style monitors were developed by the folks at PARC and later at Digital SRC who developed Mesa, Modula-2+ and finally Modula-3, and have stood the test of time. They are more expressive than semaphores, since they tie together the protected variables and the conditions for which processes are waiting, and they open the way for both compile-time and run-time checks.
What is a Mesa-style monitor? Most of us are probably familiar with a Hoare-style monitor, in which a group of variables are protected from concurrent access by a monitor lock. This is like a Java object in which all of the public methods are declared to be synchronized. I'm not at this point proposing any language changes, so it would be up to the programmer to put the bodies of the synchronized methods inside a monitor critical: [ block ], as is presently done with semaphores.
Once inside a monitor, what does a process do when it finds that it needs some resource that is not presently available? It waits on a so-called condition variable, which is logically associated with the boolean condition for which it is waiting, such as data being present on a socket, a file being free, or a queue being non-empty or non-full. The difference between Hoare and Mesa-style monitors is in the way that condition variables are handled.
In both styles of Monitor, waiting on a condition variable entails atomically releasing the monitor lock and suspending the current process. (This can be easily implemented by a new VM primitive, or by using a counter and a semaphore) . The main difference between the two styles lies in the semantics of a signal on a condition variable. In the Hoare formulation, a signal can only be performed by a process that itself has the monitor lock, and which can therefore guarantee that the condition for which the waiting process is waiting is now true. If there is no process waiting, the signal does nothing, but if there is a waiter (or several), the semantics of signal is to suspend the signaller and transfer control to the waiter(or one of the waiters). In the Mesa formulation, signal is a hint: the signaller says that something may have changed, and the responsibility falls on the waiting process to check whether the condition for which it is waiting is satisfied, and if not, to wait again. This may seem like a step backwards, but in fact it significantly simplifies the implementation, because it is always correct to signal a condition — it's a bit like a "self changed" message in the Observer pattern. This means that the signaller does not have to hold the monitor lock, and that interrupt handlers and hardware can therefore signal condition variables directly. It is this property that makes it possible to use Mesa monitors as the _only_ synchronization construct.
The tradeoff is that the condition must always be re-checked after a wait. The correct usage pattern is
[ condition ] whileFalse: [ conditionVariable wait ]
and not (condition) ifFalse: [ conditionVariable wait ].
Chuan-Kai Lin (a student here at OGI) and I have developed an implementation of Monitors in terms of semaphores (which is sometimes a bit convoluted, to avoid race conditions). We have some simple tests, but the implementation needs further testing and, more importantly, careful scrutiny by many eyes, before I would be confident that we have avoided lost signals and race conditions. However, I would be much happier with an implementation that bypasses Semaphores altogether, and has the kernel primitives operate directly on condition variables. This would not only be more efficient (not an insignificant consideration when one is talking about interrupt handling), but also simpler and therefore less likely to contain hidden bugs.
My first attempts to achieve this involved subclassing Semaphore to create classes for monitor condition variables and monitor locks, but carefully leaving the layout of the instance variables the same so that the existing Seamphore primitives would still work. Nevertheless, the primitives failed. Tim Rowledge tells me that this is because there is an explicit test of the class of the object (= Semaphore) in the implementation of .
So, in order to make progress on this, we need to either modify the implementation of the Semaphore primitives to remove this check, or to create some new (unchecked) primitives. I've never played with kernel primitives before, and have no idea even how to start looking at doing this.
OK, I'm almost finished. The final fly in the ointment is what to do when a waiting thread shouldn't wait any more. Perhaps it is waiting for data on a socket, and the socket was closed. Or perhaps it was waiting for user input, and the user just hit the interrupt key. How does one deal with this?
The "pure" answer is: in the usual way. You have to write your conditions so that they say "data available OR socket closed", etc. This is obviously correct, and often quite impractical.
The answer adopted by current Squeak code is to use timeouts on the wait. The main problem with this (apart from implementing it correctly and efficiently; see Julian Fitzell's SemaphoreWaitAndWaitTimeoutFix-jf from 11 Sept 2003) is that the semantics of the wait message are no longer clear. Even with the Hoare monitor, the meaning of being awoken is no longer that the condition for which you were waiting is satisfied: it might just be that a timer has gone off. So, the awoken process has to check what has happened explicitly — and presumably going back to wait again is not the desired behavior if a timeout has occurred and the condition is still false.
The Modula-3 libraries solve this problem by introducing alerts, which are sent to processes. The receiving process sees an "alerted" exception, or a flag set. If a process is waiting on a condition that might be false for a long time (like user input), the wait should be an "alertWait", which means that if the process is alerted, the wait will be terminated. Once the process is running again, inside the monitor, it will experience an alerted exception: it will not execute the normal code path. I'm not a great fan of exceptions, so in Squeak I would instead suggest a method like:
aCondition waitIfAlerted: [ aBlock ]
which will terminate normally if the condition is signalled, and will evaluate aBlock if it is alerted.
So the question for discussion on the list is: is this cleanup worth doing? Is there a volunteer to work on the kernel primitives?
To help the discussion, I've assembled some resources on the Swiki: see Monitors. The prototype implementation is there, as are links to various documents that provide more background than there is room for here.