Last updated at 5:03 pm UTC on 29 October 2008
Abstraction is a key concept in OO programming. It's implementation in Squeak has raised some interesting issues. The Smalltalk idiom for indicating that a method (and therefore its class) is abstract is:
Consider the following discussion involving a change to the semantics of canUnderstand and respondsTo to handle such methods in a special way.
Nathanael Schaerli submitted on 12 December 2003, a "fix" to "canUnderstand so that it deals with abstract methods (i.e., subclassResponsibility and shouldNotImplement) in the right way."
Peter van Rooijen posed the key concens immediately: "Are you sure this is wise? How exactly did you decide what is "the right way"? How do other dialects implement #canUnderstand:? Are they also "wrong"? If you need the new semantics for something you are working on, why not add another method that does exactly what you want instead of modifying this old-timer?" A little bit later in the discussion he adds: "I think there is another way of looking at this that might be helpful. Let us look at #canUnderstand: as a companion of #doesNotUnderstand:. It doesn't seem so far-fetched to think that "Understand" in both selectors has a related meaning (even though one is on the class and the other on the instance, so there is some inconsistency there). If that is indeed a viewpoint we would like to take, then
(someObject class canUnderstand: #someSelector)
ifFalse: [someObject someSelector]
would be expected to run into a #doesNotUnderstand:, not a subclassResponsibility or a #shouldNotImplement or what have you. If accepted, this would be an argument against the proposed change."
In fact, it turned out, Richard A. O'Keefe had earlier proposed adding two messages:
Stephane Ducasse supported Nathanael's proposal with this analysis: If canUnderstand: #foo replies true when a class has a method
^ self subclassResponsibility
or shouldNotImplement then there is something really wrong in Smalltalk....because normally people think that they can call foo on the receiver but this is not the case with those examples. This means that you will never be able to find code from VW or any other smalltalk that breaks what nathanael proposes because else the code itself breaks in the other dialects :) "
Bernhard Pieber also supported the proposal with this analysis: "My first reaction was that I am totally in favour of including your changes. I looked at all the senders of #respondsTo:. There are 125 of them in my image. I looked at about 20 of them and all of them had the following pattern:
(anObject respondsTo: #someSelector) ifTrue: [anObject someSelector]
For me this clearly indicates that a developer using #respondsTo: in this way assumes that (s)he can savely call the selector without getting a debugger. I consider the old behaviour to be an outright bug. And it is not problematic to change the semantic of buggy methods, quite on the contrary. I am sure the image will be more robust with the proposed change.
Then I looked at the senders of #canUnderstand:. There are only 14 of
them in my image and there are two differen usage patterns:
(anObject class canUnderstand: #someSelector) ifTrue: [anObject someSelector]
The second pattern looks like this:
(aClass canUnderstand: #someSelector) ifFalse: [
self compile: #someSelector]
The first pattern should be rewritten using #respondsTo: anyway. Then only the second pattern remains. And these should not use the new semantic IMO. Because then the marker methods might get recompiled and the marker information is lost. That would be a Very Bad Thing, don't you think so?
So my conclusion: Change the (currently buggy) semantics of #respondsTo:, rewrite the first usage pattern of #canUnderstand: with #respondsTo: and do not change the semantic of #canUnderstand:.
(Another interesting but totally different discussion: I do not like #respondsTo: very much in the first place. I am almost sure some senders could be redesigned so that they do not need it anymore.)"
Julian Fitzell makes the argument for enhancing rather than replacing: "I'm not sure I buy this change either. It seems awfully special-casey... I mean, the object does respond to that message - it does so by returning an error telling you that a subclass should have implemented the method in one case. If the subclass hasn't implemented the method,this is an error that we should see - we shouldn't just silently ignore that method. Obviously this depends on where #canUnderstand: is being called, but the appropriate behaviour to me seems to be to check whether the object responds to the message, not to consider what it is going to do when it responds to the message - that's a slippery slope that we can't go very far down unless we're going to somehow be able to come to a semantic understanding of all arbitrary code. I agree with Peter that this might be better as another interface; at very least it demands some debate before inclusion."
Nathanael addressed point by point each of the questions so raised so far and then Richard A. O'Keefe (who had submitted the "enhance rather than change" approach") responded to many of the Nathaniel's answers. the arguement. Here I present Nathanial's email with Richard's Richard's comments interleved. I don't use quotes.
NS: [Regarding Richard's proposaladding two messages Object>>honestlyRespondsTo: and Behavior>>canHonestlyUnderstand ]
I like your suggestion. In fact, this is what I did in the Traits prototype implementation, except that I used slightly different names:
Even though this worked well and is definitely the most uncontroversial thing to do, I also never felt 100% happy with it. The reason is that the need for having a method named honestlyRespondsTo: or reallyRespondsTo: just gives me the feeling that something is wrong with this protocol (i.e., repsondsTo:, honestlyRespondsTo:, canUnderstand:, canHonestlyUnderstand:) in the first place. It just doesn't seem very clean.
ROK: The way Squeak currently implements #respondsTo: and #canUnderstand: is compatible with the way other Smalltalks do it. To be sure, the operations have misleading names. If it were a matter of designing a new language, they'd be good names to avoid. To be sure, there is a fair bit of broken code that uses #respondsTo: when it should not. But the best response is to fix that code, not change the semantics of a basic operation.
NS: I can well see your argument regarding compatibility to Smalltalk-80 and its description in textbooks. If people want to stay compatible to Smalltalk-80 (even though Squeak is not really compatible anymore anyway), changing such basic operations is definitely not the right thing to do. I guess I didn't value this Smalltalk-80 compatibility concern high enough because it is not important to me.
NS: [Regarding how he decided this was the right way ]I decided that this is the right way because it is the semantics that is expected by the vast majority of senders in the system.
ROK: There are 112 senders of #respondsTo: in Squeak 3.6g2; if it is true that "(the human authors of) the vast majority of senders (of respondsTo:) in the system" didn't understand the semantics of a basic operation which is described extremely clearly in several classic textbooks, this is extremely worrying. But still, adopting (and correcting, if necessary) the #honestlyRespondsTo: solution and fixing the broken code seems better than changing the semantics of a basic operation.
NS: As an example,consider the method ImageReadWriter>>close:
"close if you can"
(stream respondsTo: #close) ifTrue: [
stream closed ifFalse: [stream close]]
This method uses the message #respondsTo: (which is essentially the same as #canUnderstand:) in order to know whether the class of 'stream' acually offers the functionality #close and not whether the class of 'stream' declares an abstract or disabled method #close!
ROK: Any code that passes in an object where an operation required by the receiver is abstract DESERVES to break; in fact ought to break as loudly and as early as possible.
NS: Therefore, the code immediately breaks if it is used for a stream class that either declares the method #close to be abstract (i.e., by implementing it with the body 'self subclassResponsibility') or declares it to be not appropriate (i.e., by implementing it with the body 'self shouldNotImplement').
ROK: And that's exactly what it SHOULD do. If the operation is abstract, it means that class is an abstract class. There shouldn't be any instances. If you have an instance of an abstract class, what you have is an instance which doesn't know whether it supports a particular operation or not. Suppose, for example, we have a concrete stream object of a class which leaves #close abstract (self subclassResponsibility). Then there is no implementation of
(stream respondsTo: #close)
ifTrue: [stream closed ifFalse: [stream close]]
which can do the right thing. Why? Because stream doesn't know whether it responds to #close or not. Yes, calling #close in this case is wrong, BUT SO IS NOT CALLING #close
NS: This is really problematic because especially declaring a method as abstract is an extremely important concept of OO programming and it is therefore not acceptable that Smalltalk does not offer a programmer to do so without breaking the system.
ROK: I agree 100% that abstract classes and abstract methods are important OO concepts. I expect anyone who uses them to have made an honest attempt to understand them. A class which has at least one abstract method is an abstract class. And one of the basic rules about abstract classes is DON'T CREATE INSTANCES OF ABSTRACT CLASSES.
If you think you have an application where this makes sense, what really makes sense (at least in Smalltalk) is to create an application-specific subclass of the abstract class where you have made your mind up about which operations will be supported and how. The real problem here is that Squeak doesn't actually know when a class is abstract, and doesn't notice when you create an instance of one.
Now, I do agree that if a class defines a selector using
then an instance of (a concrete subclass of) that class should not be regarded as responding to the selector in question. That's why I wrote #honestlyRespondsTo:. A class where you have made a conscious decision "No I do not support this method" is different form a class where you have decided not to decide yet.
NS: I was running in exactly those problems [the system breaking because isSubclassReposibility] when I wrote an analyzer that detects methods that are required in a class (e.g., are self-sent) and then automatically declares them as abstract by compiling a method with the body 'self subclassResponsibility'. However, this horribly crashed the whole image. Why? Because the current semantics of #canUnderstand: (and #respondsTo:) does not correspond to the semantics that the users of these selectors expect.
In fact, there were dozens of places deep in the core of the system (e.g., in the Morphic framework) that caused a runtime error just because I was being a good programmer and actually declared abstract
methods as 'subclassResponsibility', which is unarguably a good style of programming. (It makes the code and the used design patterns (e.g., the template method pattern) much easier to understand and decreases the probability for errors in subclasses).
NS: I don't know [how do other dialects implement #canUnderstand:]. If they do it the same way as it has been done in Squeak,it is more than likely that also in these dialects, the semantics of #canUnderstand: and #respondsTo: does not correpsond to what is expected in practically all the users of these methods. It is more than likely that also in these dialects, the semantics of #canUnderstand: and #respondsTo: does not correpsond to what is expected in practically all the users of these methods.
ROK: Quite likely. This is why the better Smalltalk textbooks say "don't use #respondsTo:, it doesn't do what you think it does." The real best thingto do to Squeak is probably not to butcher #respondsTo: but to redesign much of the code that uses it. And of course the Lint checking part of RB should offer a warning about uses of #respondsTo:/#canUnderstand:. Maybe it does; I haven't checked.
NS: As a consequence, I assume that also these dialects do not allow a programmer to declare methods that should be implemented in subclasses as 'subclassResponsibility' without breaking other code.
ROK: I can't make sense of that. They are just like Smalltalk: you can define any method you want to be self subclassResponsibility, but if you do that,you then have an obligation NEVER TO CREATE A DIRECT INSTANCE OF SUCH A CLASS.
NS: [why modify the old-timer] Our goal is to have a kernel that allows a programmer to explicitly declare methods that should be implemented in a subclasses (i.e.,abstract methods) and methods that are not appropriate for a certain class without crashing the whole image.
ROK: We have that already. The problem is not declaring methods that should be implemented in subclasses, it is creating direct instances of abstract classes. The fix that is needed is some way for the system to know that a class is an abstract class and for it to forbid attempts to directly instantiate such classes. Suppose that #respondsTo: is changed so that it answers false when you ask about a selector for an abstract method. (I repeat, this should not be possible, because there shouldn't be any instances that have abstract methods.)
- If it says (true), then this will typically result in the method being called, which means there will be a run-time error.
- If it says (false), then this will typically result in the method NOT being called, which is also wrong, because almost all #subclassResponsibility messages are refined by real implementations, not by #shouldNotImplement.
If in some particular case, the right reaction to an abstract method is to ignore it, then in that case, the right way to define the method in the first place was as a method that does nothing, NOT as abstract method.
NS: Thus, we have three different choices:
a) Leave everything as it is. This means that declaring a method as abstract or inappropriate for a class has nasty side-effects that break other code or even crash the image. As a consequence, it is for example not possible for a programmer to consistently declare abstract methods. In fact, in the current version of Squeak I would even suggest a programmer never to declare 'subclassResponsibility' method because it is safer.
ROK: This is misleadingly stated. There is no problem with declaring abstract methods. The problem is ignoring the gross error of creating a directinstance of an abstract class. (OK, OK, I'm a hacker too; it can be quite useful for testing to create such an instance and run some tests on it. However, this is precisely when you DON'T want #respondsTo: hiding calls to unwritten methods. If one of your test cases should be calling #close and that's not defined, you want to know about it. More accurately, you desperately need to know about it.)
NS: b) Change #canUnderstand: (and #respondsTo:) so that the semantics actually corresponds to what 99% of the the current users expect when they call it.
ROK: As noted above, there is, and can be, NO fix to #respondsTo: which will always give the right answer for code which is so blatantly buggy as to provide an object with an abstract method to a receiver which has a use for calling that method.
This is not an argument against revising #respondsTo: to say "no" when presented with a #shouldNotImplement definition.
NS: c) Introduce new methods for #canUnderstand: and #respondsTo: and change practically all the users of #canUnderstand: and #respondsTo: so thatthey now use these new methods instead.
ROK: This is the best way to proceed. Why? Because code that has been converted to use the new methods is code which has been inspected to discover what the intended semantics actually is, and code that has not been so converted is code that hasn't been converted yet, so you want the run-time error as a way to tell you "you haven't looked at this one yet."
NS: For us, a) is not acceptable because we do not want to have a kernel that does not allow clean OO programming. When we had to choose between b) and c) we decided for b) just because #canUnderstand: and #respondsTo: are such old-timers!! This may sound paradox but it is in fact logical.
As the Squeak image shows, the vast majority of programmers are used to using these old-timer methods when they actually should use the new methods. Therefore, when going for alternative c), it would be just a question of time until there is new code that inappropriately uses these methods again! This is not the case in b), because we make the semantics of these methods consistent with what the majority of Smalltalkers expected for decades.
ROK: I do not believe that there is any other OO community that would regard creating direct instances of abstract classes as "clean OO programming". All the statically checked OO languages I'm familiar with reject such programs at compile time.
In my (still unfinished, probably never to be finished) Smalltalk compiler, I created new messages abstractSubclass:... for declaring abstract classes,and didn't allow "self subclassResponsibility" in concrete classes. One reason was the dispatch method I was using; numbering only the concrete classes helped. But I certainly found it helpful to mark the distinction:it simplified my code. Squeak could tell abstract classes from concrete classes without ceasing to be dynamic; it could be as simple as having a "number of subclassResponsibility methods" variable in a class.
NS: We neither need nor want to understand the semantics of arbitrary code. We just want to offer the programmer a the means to declare a method as abstract or not appropriate in a way that is compatible and understandable by the meta-protocol of the language.
The crux of the problem is that Smalltalk does not have language feature for these two things. Instead, Smalltalk suggests an idiom, which is declaring an abstract methods by implementing it with the body 'self subclassResponsibility' and declaring inappropriate methods by implementing it with the body 'self shouldNotImplement'.
And even though this may seem like a simple and good solution at a first glance, it introduces a lot of problems because this way of declaring abstract methods is not understood by the meta-protocol of the Smalltalk language. As a consequence, all the meta-functionality of Smalltalk (i.e., methods such as #canUnderstand: and #respondsTo:) thinks that such abstract or disabled methods are regular methods that are intended/expected to be called. But this is the absolute opposite of what the programmer wants to express and it therefore leads to paradox behavior when such meta-functionality is used.
Come to think of it: Isn't it paradox if a programmer wants to explicitly state that a method is not implemented in a class (e.g., by using 'subclassResponsibility' or 'shouldNotImplement'), and as a consequence, all the meta-functionality of Squeak thinks that it actually is implemented?
And this is precisely what we want to fix: We want to make sure that the meta-functionality of Squeak actually understands the special meaning of these methods so that the programmer has a way of consistently declaring abstract and inappropriate methods. This is all, and it does not require understanding arbitrary code. In fact, it just requires understanding two idioms (i.e., 'subclassResponsibility' and 'shouldNotImplement').
NS: [Regarding the concept of #canUnderstand: as a companion of #doesNotUnderstand: with related meanings] On a low and technical level, you are right. It is indeed true that Squeak 'understands' a message that has been declared as abstract or disabled by the programmer by implementing a method with the body 'shouldNotImplement' or 'subclassResponsibility'. However, this is a consequence of the unfortunate fact that Smalltalk language is too poor to express declaring abstract or disabled methods in any other way than actually implementing them! Which is completely paradox to begin with!
On a higher and conceptual level, writing a method with the body 'shouldNotImplement' or 'subclassResponsibility' is the way a programmer declares a method as not appropriate or available in a certain class. And therefore, it makes total sense if such a method is then 'not understood' by the class. In fact, this is the way it should be because it is precisely what the programmer intended to achieve.
Since I prefer naming methods according to their conceptual and higher-level meanig rather than their lower-level implementation (e.g., I prefer #addMethod: rather than #insertMethodIntoMethodDictionary:), I prefer the higher-level view. This is particularly the case since we are dealing with a higher-level language like Smalltalk and not with C or C++.
Also, I argue that this higher-level view is the only consistent view anyway. Just consider that Smalltalkers implement a method with the body 'self shouldNotImplement' to declare that the same method should not be implemented! Looking at it on a technical level, this does not make any sense, because it is necessary to implement a method in order to express that it actually should not be implemented!
It is only when we look at this on a higher-level that it makes sense. On this level, we just consider the intention of the programmer, and this is that he wants to declare that a method is not appropriate for a certain class. Whether we do this by associating a designated method body to the selector in the method dictionary, or whether we use syntactic construct such as in Eiffel (i.e., the 'remove:' clause) does not matter.
The only thing that matters is that the programmer knows how to declare this and that the system behaves in way that is consistent with the conceptual intention of the programmer. Looking at the existing users of #canUnderstand: and #respondsTo:, this was obviosuly not the case in Squeak. And that's why we suggested this fix.
NS: [Regarding the concept of not instantiating abstract classes] I see that my example was not very well chosen (I just browsed the senders of #respondsTo: and took a more or less arbitrary one of them). However, I don't think that dealing with abstract classes in Smalltalk is as simple as 'do not create instances of them'. First we need to ask ourselves the question when a Smalltalk class is abstract. Is it abstract when it defines or inherits a method with the body 'self subclassResponsibility' or is it abstract if it somehow sends messages to 'self' but does not implement these methods?
If you take the latter definition, then all the classes in Squeak are abstract! However, this would mean that the rule of never instantiating abstract classes would lead to a system without any instances and therefore also without any classes (since classes are instances of metaclasses, which are also abstract).
Since I cannot imagine that this is really what you meant, I suspect that you mean the former definition. But then I'm wondering what you think is the difference between the following two scenarios:
Well, accoring to the former definition, the class B is abstract whereas B' is not. Using the rule "never create instances of abstract classes", this would mean that you should not create an instance of B whereas creating instances of B' is okay.
- Scenario 1: Class A implements a template method pattern with a template method #open: that calls 3 hook methods #beforeOpen:, #doOpen:,and #afterOpen: that are implemented as 'self subclassResponsibility'. Class B is a subclass of A, but does not implement any of the hook methods.
- Scenario 2: Class A' is the same as A but does not explicitly declare the hook methods #beforeOpen:, doOpen:, and #afterOpen:. Class B' is a subclass of A' with the same methods as B' (i.e., it also does not implement any of the hook methods).
Hmm, I don't agree to this. I really can't see why it would be any worse to create instances of B than to create instances of B'. It seems that using an instance of B is as buggy as using one of B'. Therefore, the classes B and B' seem "equally abstract" to me.
And this is exactly the point I wanted to make. In Squeak, we have this sort of abstract classes (i.e., classes that issue self-sends to some methods but do not implement them) everywhere and programmers make instances of them all the time. And nevertheless, these instances work fine because the implicitly abstract methods and all the methods that call such implicitly abstract methods are never called.
However, this immediately falls down as soon as someone tries to be a good programmer and actually declares such methods as 'self subclassResponsibility' rather than just leaving them undeclared and silently hoping that these methods are never called. And the single reason for this is the fact that the meta-functionality #canUnderstand: and #respondsTo: cannot distinguish such methods that are declared asabstract.
As I mentioned in an earlier post, I was trying to be a good programmer and to explicitly declare all such hook methods. In fact, I even wrote a quite sophicticated realtime analyzer that detects such undeclared abstract methods and automatially declares them as 'required'. The reason for doing this was the belief that being able to see all these required methods is much better than just having to guess them. (Finally, somone can still ignore them if he just doesn't care). The problem is that doing this immediately crashed the whole image, because there are several places (e.g., in the Morph hierarchy) where #canUnderstand: or #respondsTo: are then fooled because they think that the methods that are now explicitly declared as abstract (rather than being implicitly abstract as before) are actually designed to be called.
Now, it is important to understand that this is not my code and the reason for proposing my fix is not that I plan to write such code in the future ;-). Much more, I'm the first to agree that a lot of this code should not have been written like that in the first place. Nevertheless, it is a matter of a fact that this code has been written and that the programmers writing this code used the meta-functionality #canUnderstand: and #respondsTo: in a way that does not allow other programmers to explicitly declare the respective methods as abstract.
And because instances of "abstract classes" (whether explicitly declared by implementing methods as 'self subclassResponsibility' or by implicitly self-sending messages that are simply not implememted) are a reality in Squeak, I think that saying that "instances of abstract classes should never be created" is generally a wise statement but does not bring us much closer to solving this problem in reality.
Now, if the majority of Squeakers prefer solving this problem like you (and I) did earlier, i.e, by introducing a new protocol #honestlyRespondsTo: and #honestlyCanUnderstand: into the official kernel and then just replacing the majority of all senders of #respondsTo: and #canUnderstand: so that they call the new protocol instead, I don't want to stop them. But there are two things about this that bother me:
First, when we need to change the majority of senders of a method so that they send another method instead, it seems to me that it may be better to change the semantics of the original method, and then just change the few cases were the original semantics was necessary. In particular, this would decrease the probability that poeple use the wrong kind of method in the future. Second, it seems to me that the need for having two protocols with the quite unclear names canUnderstand: and respondsTo: vs. canHonestlyUnderstand: and honestlyRespondsTo: is a sign that someting is wrong in the first place.
Thus, I just think that before we officially introduce this new protocol into the kernel, it would be worth thinking what's really the right thing to do, and this is precisely the purpose of this discussion.
Lothar Schenk challenged the idea that declaring abstract methods as 'subclassResponsibility'is unarguably a good style of programming. This lead to the following exchange with Hannes Hirzel
LS: Why is it 'unarguably' good style to fill abstract methods with the line "self subclassResponsibility"? In the example of #close for a general stream, is "self subclassResponsibility" an abstract formulation of the semantics of the close operation? I don't think so. What if #close just contained a comment "My subclass should implement this - do nothing"? Arguably, the trouble you have with the semantics of #respondsTo: in dealing with such cases is a consequence of putting something in these abstract methods which doesn't really (semantically) belong there. Instances of the class actually respond to #close, but the response is not meaningful in the given semantic context.
HH: Answer: Because this is an idiomatic convention in the Smalltaker subculture to define a method as abstract; see Kent Becks book about Smalltalk idioms.
LS: I know that. But, "because that's the way we do it (always did it)" or "because Kent Beck says so" is not an argument.
HH: Nathanael's point is that he wants the system to have a way of distinguishing between abstract and non-abstract methods. That's what he means when he writes the following
- "And even though this may seem like a simple and good solution at a first glance, it introduces a lot of problems because this way of declaring abstract methods is not understood by the meta-protocol of the Smalltalk language. As a consequence, all the meta-functionality of Smalltalk (i.e., methods such as #canUnderstand: and #respondsTo:) thinks that such abstract or disabled methods are regular methods that are intended/expected to be called."
LS: Is there an intrinsic property of a method which can be used to distinguish between an abstract and a non-abstract method?
HH: Even if Smalltalk is a relatively good language it does not mean that it is the only one and that it couldn't be cleaned up or enhanced somewhat.
Between 1970 and 1980 every two years a new version of Smalltalk was developed. Now we have 2003. Why not have a little clean-up?
In general OO software engineering circle the distinction between abstract and non-abstract methods is considered to be a valuable distinction.
LS: I'm not arguing against the usefulness of this distinction. I am, however,arguing that abstract methods and their nonabstract counterparts should express the same basic semantic content.
So, is "self subclassResponsibility" a formulation of the abstract semantics of a given method, e.g. the closing of a stream?