Last updated at 2:13 am UTC on 26 January 2020
This is very, very, obsolete but left here for historical reasons. As of 2007-ish we have the Cog VM family of virtual machines and as of today (2015) we have the new Spur image format and cog-spur VM's.
It looks like this project is now discussed on a different page: (obsolete) Version 4.
The VI4 project is not complete yet. Its bytecodes and stack implementation have to be reconsidered. Block closures have been separated out into its own project. See (obsolete) VI4. -ajh, 14 April 2003
Version: 27 October 2002 (beta)
by Anthony Hannan
SqueakVI4 is a new image format and VM that interprets bytecodes 50% faster and sends 110% faster than Squeak3.2 on a 1.8Ghz Pentium 4 (0 tinyBenchmarks), yielding an overall speed improvement of about 25% (macroBenchmark #6).
If you're only interested in Closures, please see (obsolete) VI4. The closure compiler works with the standard Squeak image and VM.
VI4 includes many low-level changes (see Description for details):
Unless your package relies on internals of the classes mentioned above, it should work in VI4.
- New Smalltalk compiler
- New bytecodes
- New Interpreter
- New Context format
- New Block format
- New CompiledMethod format
- Modified Process
- Refactored Exceptions
- New Smalltalk simulator
- New SystemTracer
- Added Continuations
- InterpreterSimulator fixes (so full morphic images can be run)
File-in the following changeset immediately after downloading or converting.
- VI4PostKillObsolete-ajh.cs.gz - 4 Nov 2002
- VI4ObsoleteMap-ajh.cs.gz - 5 Nov 2002
- VI4SyntaxErrorFix-ajh.cs - 26 Nov 2002
Image and Changes
SqueakVI4image.zip - 3.2-4956 image and changes. Don't forget to file-in Updates above.
SqueakVI4vm-win.zip - Windows VM 3.2.3
Squeak3.0Beta2-VI4alpha1.app.sit - MacOS X, based on 3.4.0b2
If you don't see your platform listed in Virtual Machines above, build your own VM using interp.c.zip, then please upload it to this page.
SqueakVI4cs.zip - VI4 changesets. Follow the instructions in the enclosed VI4-INSTALL.text to convert a 3.2 image into a VI4 image and VM. Then file-in Updates above.
List bugs and enhancement requests here. Please include your initials so we can talk about them if necessary.
- Jitter has to be rewritten. Ian is already working on a new version for 3.2 called J5. Hopefully, he will be able to adapt it to VI4 without too much problem. I am willing to change bytecodes or whatever to meet his needs. -ajh
- There are still some bugs in the Decompiler, but most of the time it works, plus it is rarely used if you have the image linked to sources. I will fix these bugs eventually. -ajh
- ImageSegments and DataStreams still have to be tested and tweaked to handle the new Method and Context format. -ajh
The following description has been copied from the aboutVI4 method comment in the image:
VI4 stands for Virtual Image 4.0. VI4 is a forked version of Squeak that has a different, incompatible image format from those of Squeak 3.X and earlier. The intention is for VI4 to become Squeak 4.0 when it and the community are ready. Until then, VI4 will stay up to date with the latest stable Squeak 3.X version, currently this is Squeak 3.2. However, further updates can be installed using 'update code from server' as usual. Updates specific to VI4 will be announced on the Squeak mailing list and posted on the VI4 website above.
Image Format Changes
The image format describes how objects are represented in memory, and how certain core objects are expected to be structured in the image. In VI4, so far, only the structure of some core objects have changed, and are described below.
Note, the changed core classes have been renamed with a '2' suffix, so the old (obsolete) class and the new class can exist together. Eventually, the obsolete class will be removed and the new classes will have their '2' suffix removed.
New Context Structure
Execution state is now kept as frames on a stack (one stack per process) instead of in a linked list of context objects. The frame layout is similar to traditional activation records: A new method is called by leaving its receiver and args on the stack, pushing the return method and bytecode index, then resuming execution using the new method and its first bytecode index. A method returns by popping frame temps, popping and restoring the return method and bytecode index, popping args, replacing the receiver with result, then resuming. See Block Structure below for how remote return and unwind are handled.
The stack is an indexable object, a FiniteStack, with a special compact class index recognized by the VM as having variable size, with a max capacity. The capacity is stored in the size bits of the header, while the size is stored in the first fixed field of the object. This special treatment of FiniteStacks (like MethodContexts and BlockContexts previously) is an optimization so fields don't have to be nilled out when the stack shrinks. The garbage collector won't look in fields beyond the 'size' of the stack.
Before every method activation, the stack capacity is checked against the method's required frame size. If the stack capacity would be exceeded, the stack is copied to a new larger-capacity stack and the process and VM registers are updated to point to the new stack. Process2 is the only object that has a pointer to the stack and never hands it out. During fullGC, when a process stack is traced, its capacity is shrunken if its size is much smaller then its capacity, it truncates the object and frees up the space after it. The amount to grow the stack by when copying to a new stack and the amount of unused space required before shrinking, are stored as instance variables in the stack object, so they can be fine tuned, and are set to defaults upon creation.
In the image, stack frames are referenced through MethodContext2 'proxy' objects. They behave like traditional MethodContexts, except their state is kept in the stack frame and referenced via their frame index (similar to PseudoContexts with Jitter). Only one context proxy can exist per frame. A flag is set in the frame to indicate that a proxy exist for it. The proxies are chained together in a linked list (like traditional contexts) and is held by the process's lastActiveContext field. Only 'active' contexts are included in this list. During normal execution this would only include unwind, exception-handler, and block-return contexts. However, when being inspected in a debugger this list would include contexts for all frames. The VM checks the context flag on every return and if set removes the top active context.
There is a new subclass of Process2 called SuspendedProcess. A new process starts out as a SuspendedProcess and changes to a Process2 upon resume. When a Process2 is suspend its class is changed back to SuspendedProcess. The intention is to separate out stack manipulation methods, done only to suspended processes, from running or waiting (on the Processor queue or Semaphore) processes that should not be manipulated. With this separation only a few primitive stack queries (such as the receiver of a frame) are needed in Process2.
This allows us to internalize the process stack in the VM if we want. All stack access methods in Process2 have been given an optional primitive. The VM just has to implement these primitives and convert the FiniteStack object into an internal structure upon primitiveResume. SuspendedProcess does much more to the stack including changing it during Smalltalk simulation, but it still never hands it out. The VM could implement primitives for this as well, but currently I'm assuming it will just convert the internal stack structure back into a FiniteStack object upon primitiveSuspend. This way SuspendedProcesses can be manipulated easily in Smalltalk. Note, only suspend, not wait, will cause a process to be converted.
Upon image snapshot all processes are converted to suspended processes before writing out the image file, then converted back to resumed processes after writing is finished. When an image is loaded, the active process, processes queued to be run, and processes waiting on a Semaphore are converted to resumed processes.
New Block Structure
Blocks are now separate from their context and their home context. A block is a closure containing the temp values it references from its home context and its own method. If a temp can change after being copied into a block closure, then that temp is first put into a SharedTempHolder and both the closure(s) and the home context reference the temp value indirectly through the holder. A block is created by pushing its method and referenced temp values, then calling the createBlock bytecode which pops the pushed values into a new BlockClosure object. Block methods are held in the literals of the home method. If a block does not reference any outside temps then the block is created at compiled time and held in the literals. 'self' is treated like any other temp. If a block refers to it or its instance variables, it will be copied into the block closure. If a block does a remote return (^), the home context's unique ContextTag is copied into the block closure as well. The tag is used to find the home context upon remoteReturn.
A BlockClosure is executed like any other object, except the method invoked for #value..., is unique for each instance. Hence, a block and a regular method have the same exact frame structure and same type of context proxy if created (MethodContext2). Captured temps that were copied from its home context are held in the block's indexable slots, but the block's method accesses them like instance variables. Values that are wrapped in a SharedTempHolder and instance vars of a captured 'self' require an extra level of indirection (new bytecodes).
A remote return is performed by searching the process's lastActiveContext linked list for the one that contains the block's captured contextTag. This shared tag is used instead of a direct reference from the block to its home context, so block returns will still work in copied processes. A copied process will have new contexts but will hold the same contextTags. Once the home context is found the stack is popped down to its sender. However, if an unwind (ensure:/ifCurtailed:) context is found before it, then the unwind frame is resumed with a new method (execute:then:return:) and 2 more args (the home context and the return value). This new method executes the unwind block then sends return: to the home context, which calls primitiveRemoteReturn which continues the popping/unwind cycle until the home context is reached.
New CompiledMethod Structure
CompiledMethod2 is now a regular pointer object, eliminating the special compiled method object format (as per Tim Rowledge). One of its instVars is 'bytecodes' which points to a ByteArray containing the method's bytecodes. Literals are held in the method's indexable fields. The 'header' bits format has changed, removing numLiterals, combining the primitive bits, expanding frameSize to include its actual size (instead of just small or large), and removing numTemps.
Named temporaries are treated the same as other temporary stack values, ie. they are allocated when need instead of at the beginning. Execution starts with nothing in the frame's stack. PushNil is executed if a temp needs to be allocated, but normally no pushNil is needed because the first value being stored into a named temp is already on the stack. The named temp is just implicitly assigned that stack position, requiring no store execution (and no pushNil). Only when the first assignment happens in an arg expression of a message is the pushNil (before the message) and the store (within the message) required.
VI4 has a totally new bytecode set. They are designed for a fast stack Interpreter that only keeps track of the stackPointer, instructionPointer, and currentMethod (no frame pointer). Bytecodes refer to values on the stack by offset from the current stackPointer register. The abstract bytecodes are ( means optional):
copyUp: offsetFromTop [slot: slotIndex [slot: secondSlotIndex]]
Push value in stack at offsetFromTop [or its field at slotIndex [or its field's field at secondSlotIndex]]. The basic form is used for push temp and receiver. Extra indirections are used for instVar of receiver, contents of shared temp holder, contents of shared temp holder held in a closure, and inst var of self held in a closure.
copyDown: offsetFromTop [slot: slotIndex [slot: secondSlotIndex]]
Same as above except store instead of push.
If the compiler sees that a temp given to a block is changed and used later then it will issue this bytecode after the temp if first assigned. It will cause the temp to be wrapped in a SharedTempHolder. This value holder will remain in the stack frame and be given to blocks that use it. The home method and block methods will know to access the temp indirectly.
Create a block of size closureSize and pop top closureSize free vars and the method below them into it, then push the block.
The returnHomeFlag is expected to be on top with the return value below it. Find the matching homeContext then call primitiveRemoteReturn.
Pop numTemps, pop and restore sender method and bytecode ip, check flag and if true pop process's lastActiveContext, pop args, then replace receiver with result.
Check flag and if true push process's lastActiveContext. If false, create context proxy for current frame, add it to lastActiveContext chain, and push it.
sendLiteral: selectorLitIndex [superOf: behaviorLitIndex] numArgs: numArgs
common send bytecodes
The actual bytecodes are just compact encodings of the ones described above (see BytecodeWriteStream initializeBytecodeTable). Over 90 bytecodes remain unused. This leaves room for new bytecodes in the future without requiring another image format change.
The following are changes to classes that do not affect the image format, but support the changes above.
New Smalltalk Compiler
The back-end of the Smalltalk-to-Bytecodes Compiler has been totally rewritten. And all of the compiler classes have been put into its own system category called 'Compiler-'. Scanner is the exactly the same, Parser2 is basically the same, the ParseNode2s are different but similar to the originals, and the emit code is totally different. They emit to an InstructionBuilder by sending it abstract bytecode messages similar to the ones above (see InstructionBuilder comment and its 'instructions' method category).
But before the parse nodes can emit temp vars, var usage has to be analyzed to determine those that need to be wrapped in SharedTempHolders (see calcVarUsage). A FiniteAutomaton is used to advance the useState of each var depending on read/write and whether it is in a closure or not (see VarUse). Those that reach the isIndirect state will generate wrapInTempHolder after the temps first assignment.
The InstructionBuilder builds BytecodeInstruction objects and groups them by basic block (InstructionSequence). A basic block is a sequence of instructions that are always executed in order, only the last instruction jumps or returns. Some optimizations are done to the instructions (see simplify) like collapsing constant conditional jumps generated by inlined and:/or: message nodes. Bytecodes are then generated by sending an abstract bytecode messages from each intruction to a BytecodeWriteStream. The BytecodeWriteStream encodes these messages into bytes on its byte array.
Care was taken to keep the parse tree and the InstructionBuilder simple so other languages could potentially be compiled to bytecodes, including different versions of our own Smalltalk language. The programmer can choose to parse to the existing parse nodes or build his own parse nodes and emit to the InstructionBuilder. The parse tree was kept simple and easy to build by: avoiding back pointers; keeping all blocks the same whether inlined or not; keeping all messages the same whether inlined or not; and keeping all temp vars the same whether captured in a closure or not. Rather, these distinction are calculated during emit and not stored.
Of course the BytecodeDecompiler is new. It interprets bytecodes and builds the parse tree directly (no intermediate representation like the InstructionBuilder). However, there are still a few bugs that likely won't get in you way, since the decompiler is rarely used. Even when 'decompiling' a CompiledMethod2 it first attempts to find its source code and if found just parses it to produce its parse tree (see CompiledMethod2 blockNode).
New Smalltalk Simulator
BytecodeSimulator will execute a SuspendedProcess in Smalltalk. This code used to be in ContextPart and its super InstructionStream but has been factored out to BytecodeSimulator and its super BytecodeDecoder. Top-level step methods are still in MethodContext2, see its 'stepping' protocol. The complete... stepping methods actually resume execution of the process in the VM but inserts an unwind block to suspend it when the context returns (analogous to quickStep).
InstructionStream2 is used to collect bytecode Message objects similar to the abstract bytecodes described above. The message objects are used to print the bytecode representation of a CompiledMethod2.
Sending callCC (call-with-current-continuation, ala Scheme) to a block of one arg, copies the sender's entire context (the entire stack), wraps it in a Continuation, and gives it to the block's argument. The block can save the continuation and return a value for the current stack. The continuation can be executed anytime and many times later by sending value: to it. Each execution copies the saved stack and executes it with the argument returned to it.
If an exception resides in a process stack while refering to earlier contexts, the process is prevented from being copied, ie. a continuation cannot be created (see MethodContext cannotCopy and its senders). This is because a single exception would be shared between two processes and could have its handlerContext changed from two different processes. Using context tags wouldn't help, unless the handlerContext was not an instance variable, but instead a context (temp) variable.
In general, anytime you are holding onto a context in your own process you should prevent the process from being copied by sending cannotCopy to the lowest context that is being held. Meaning once it is popped it should be ok to copy the process again. The preferred solution is to use ContextTags which copied contexts share, but they are not always reasonable to use.
SystemTracer2 is a new SystemTracer that should replace the original. Object conversion is done independent of tracing, making it easier to perform the conversions without messing up the tracer. This does require more space, but the simplicity, I think, is worth it. Also, the code is factored so one just has to create a subclass of SystemTracer2 and override certain conversion methods to create his own converted image (see VI4SystemTracer).
The InterpreterSimulator has been fixed so you can now interpret a regular full-size MVC or Morphic image. There is one quirk, however, a morphic project sometimes comes up all black. (It comes up fine on one computer and comes up black on another. Both are Windows 98. Maybe it has something to do with the graphics hardware.)
C Generator Enhancements
I've changed TMethod inlining (see inlineCaseStatementBranches2In:localizingVars:) so it no longer coalesces bytecode temps into shared temps (t1, t2, etc) for the interpret routine. This creates many more temps for the interpret routine, but a smart C compiler should move many of these to registers when it sees they are only used for a short period of time in a specific bytecode case. John McIntosh tested this change on the 3.2 VM and found it improved bytecode speed by about 5%.
New Context Inquiry Mechanism
'thisContext inquire message' will search the sender chain for the first receiver that understands message and send it to him. The receiver may choose to return a result or pass the inquiry by signaling the InquiryPass notification. If he returns a result, the result will be the result of the inquiry message. If he passes the inquiry, the search continues with his senders. If the search exhausts without finding a responsible receiver, an InquiryFail notification is raised, the default of which is to do nothing and return nil as the result of the inquiry message.
Closure Test Cases
Thanks to Boris Gaertner and Rob Withers for writing test cases for block closures. See BlockClosuresTestCase (of course they all pass).
Enhanced User Interrupt (Alt-.)
If the user interrupt handler does not find any process except the idle process then it start a new UI process. This way if you kill or suspend the UI process you can get one back by pressing Alt(Cmd)-..
Other possible image format changes have been discussed on the mailing list but have not been included yet (because people have been waiting for this release). Tim Rowledge appears to be heading up this effort.
You should see a general speed improvement of about 25%. Try running macroBenchmarks #6 in this image versus a 3.2 image (the other macroBenchmarks are not comparable because they involve compiler, context, or interpreter code, which have changed). Enjoy!