A Preliminary Specification
Last updated at 2:19 pm UTC on 27 January 2003
So here is my sketch of an ORB; you could actually call this lightweight RMI (remote method invocation), as the idea is similar to the Java solution:
Each image has an ORB which can contact ORBs of other images using a direct socket connection. An ORB is identified by a host name and port number pair. Each ORB keeps track of all objects which are known by other images. These are the owned objects. It also keeps track of all proxies (or stub or shadow or remote objects) which exist for other images' objects. (Let's ignore distributed garbage collection for now.)
Each object is uniquely identified by its UID which is composed of an object ID (that is assigned by the owning ORB) and an image ID that assures that UID are really unique. (There are different ways to assure that different ORBs will pick unique IIDs, we want to ignore this detail for now.)
A proxy knows its UID and the image's orb. It is implemented as an object without a superclass which understands just the bare minimum of messages and forwards all unknown messages via its orb to the real object. This should be completely transparent to the user.
An image is called client, if its ORB calls another image's ORB, which is called server then. An UID is used by the client to identify the receiver object of a remote method invocation and to locate its server ORB.
Example: The client (say IID=1) wants to send #yourself to a proxy that represent the object . Let's assume, this is the first time, we want to connect the server (IID=2). The client will lookup the pair for IID=2. It will open a socket and the server will hopefully accept the request.
The client will identify itself by sending its IID. Client and server will then exchange hostname and port pairs (the reason for this is explained in Sun's RMI spec). The client can now send a remote call (binary encoded in some way) that identifies the object, the selector and all arguments (which could be proxies of course). If the client wants to send normal objects, it will replace them by newly created proxies first.
The server will lookup the receiver object using the OID and will then perform the selector. Argument data are copied by value for "primitive" data like numbers, strings or symbols. Proxies are transfered as UIDs. If the IID is the same as of the receiving ORB, they are replaced by references to the original objects. Otherwise, the ORB checks its known proxies or creates new proxy objects.
It could happen that the server receives a proxy with an unknown IID. If this happens, it will request more information from the client - which should know these - and add a new entry to an IID list; just in case the server wants to contact the proxy's ORB in the future. (If a well known port is known, this can be omitted. The host name is already known from the TCP/IP connection. This however restricts the system to one ORB per computer.)
Otherwise, the result of the method invocation is encoded and is sent back to the client. The client should have a time out in case the server goes down. The server should handle each connection to another ORB in one process to assure that a dead client can't block the whole ORB.
For the binary encoding we need a simple encoding for all primitive types and arrays. We also need an encoding for proxies and remote calls. This encoding is much simpler than any existing BOSS or ReferenceStream system. (Optionally, complex objects could be copied by value, but that's an extension.)
Stefan Matthias Aust
Comment added by Stephan Houben.
I was thinking about this kind of thing the other day.
Obviously, you don't want every object to be passed "by reference", i.e. using the scheme described above. E.g. a simple SmallInteger would probably always passed "by value".
But what about giving any object the possibility to be moved to another Squeak image? Let's say, ORB A notices that some object X is only referenced by ORB B, and not by any of its own objects anymore. Then it makes sense to move X to B.
Now what does this buy you apart from extra implementation complexity? Well, the cool thing is that this solves the problem of distributed garbage collection. Let's say that object X on ORB A is referenced by object Y on ORB B, and vice versa, but there aren't any further references. Now, we give the rule that when an object isn't referenced
anymore by any local objects, but it is still referenced by remote objects, then the object is moved to some ORB which contains objects
that still references it, with the provision that the object
may only be moved to an ORB with an OID higher than the sending ORB's OID. If A has a higher OID than B, then B will move object Y to A, and lo and behold! on the next garbage collection A will
discover that both X and Y are garbage.
Obviously, the ORB with the highest OID might end up doing all the garbage collection work of all the others, and the network might get filled with transmissions of all those soon-to-be-collected objects, but I guess that that depends on how common inter-ORB cyclic structures are.
Mmm, perhaps I should write a simulation for this.
But not always you can migrate an object from the one image to second, due to security impacts.
Guaranteed unique Universal IDs are possible. Called GUIDs I think, there is an algorithm published for it. It produces a 64 byte value guaranteed unique for all space and time, and uses some unique value provided by the host computer ( usually the Ethernet MAC value ) as a seed. So as long you use different MACs, you're okay.
Frankly, I think this should be the method used. It guarantees uniqueness, and would even provide a way to hold references across a network, and serialize them to disk (IE subst the GUID for obj mem address when writing to disk, or any serialization ).