COGMethodStructure
Last updated at 4:05 am UTC on 7 June 2015
A machine code method in COG has a very defined structure, as you might expect. But it's not quite as you might expect since it starts before the beginning...
The basics
The definition of cog methods is provided by classes such as CogMethod, CogBlockMethod and Cogit (in the VMMaker.oscog package ) that work with the classes that support full simulation of the system within Squeak and with the Slang translation classes. Some minor Smalltalk inelegances are required to let Slang produce working C code. Life is hard.
A method consists of
- a header that describes the assorted flags and sizes needed in this method
- an abort and entry code prelude that handles send failures, stack check overflows and class checking
- optional primitive code
- optional frame building code
- actual method code
- optional block header
- related block code
- related dispatch
- metadata
Header
Let's look at an example header in the simulator -
1800
objhdr: 8000000A000035
nArgs: 1 type: 2
blksiz: 2F8
method: F4C570
mthhdr: 80011
selctr: 7530A0=#filePositionFromSourcePointer:
blkentry: 0
stackCheckOffset: 88/1888
cmRefersToYoung: no
Broken up a bit for your elucidation -
objhdr: 8000000A000035 | this is a 64bit chunk carefully crafted to convince the garbage collector that this is innocuous; all the cog methods should have the same value |
nArgs: 1 type: 2 | the number of arguments this method expects and the type code - 1-> free chunk, 2-> method, 3 -> block method, 4 -> closed PIC, 5 -> open PIC |
blksiz: 2F8 | size of the method in bytes, including the header and metadata. The size is rounded up to an 8 byte boundary |
method: F4C570 | oop of the Smalltalk CompiledMethod this is derived from |
mthhdr: 80011 | the header of the original method? |
selctr: 7530A0=#filePositionFromSourcePointer: | the oop of the selector for this method |
blkentry: 0 | offset to the block dispatch code at the end of the method; in this case there are no blocks to handle and so a 0 offset is used |
stackCheckOffset: 88/1888 | byte offset to the instruction after the stack overflow check code-snippet in the prelude. More later |
cmRefersToYoung: no | there is a list, and this method is not on it |
When translated into the murky waters of C, this becomes
typedef struct {
sqLong objectHeader;
unsigned cmNumArgs : 8;
unsigned cmType : 3;
unsigned cmRefersToYoung : 1;
unsigned cpicHasMNUCase : 1;
unsigned cmUsageCount : 3;
unsigned cmUsesPenultimateLit : 1;
unsigned cbUsesInstVars : 1;
unsigned cmUnusedFlags : 2;
unsigned stackCheckOffset : 12;
unsigned short blockSize;
unsigned short blockEntryOffset;
sqInt methodObject;
sqInt methodHeader;
sqInt selector;
} CogMethod;
Depending upon your C compiler there may well be some number of bytes after the real content of the C struct in order to meet platform alignment requirements. On ARM for example, it seems to enforce 8byte alignment and thus the sizeof(CogMethod) in C-land is 32 rather than the 28 used in sim-land.
Abort
The first code is the abort code that handles either a stack overflow (detected later in the code) or an inline cache miss. Conceptually this is
stackOverflow: Move 0 to ReceiverResultRegister
inlineCacheMiss: Call abortTrampolineFor nArgs
In ARM code this becomes -
0000181c: mov r7, #0
00001820: push {lr}
00001824: bl 0x00000700 = 16r700 = ceMethodAbort1Args
(Note the explicit push of 'lr' here; unlike the x86 an ARM doesn't push the return address from a call onto a stack. Much fun was had in working out how and when to push the ARM link register during the development of the ARMCog.)
Entry
Following this we have some entry code that varies a little depending upon the memory manager (Spur or the older V3 system) and whether or not we are running a NewSpeak system. Looking at a Spur Squeak system we get -
immediateTest: And 1 with TempReg
Jump to jumpCompare
entry: And tagMask with ReceiverResultReg, result into TempReg
Jump if not 0 to immediateTest
Load lower 32bits of object header into TempReg
And classIndexMask with TempReg
jumpCompare: Compare ClassReg with TempReg
Jump if not equal to inlineCacheMiss (see above)
noCheckEntry:
Again, in ARM code
00001828: ands r0, r0, #1
0000182c: b 0x00001844
entry:
00001830: ands r0, r7, #3
00001834: bne 0x00001828
00001838: ldr r0, [r7]
0000183c: mvn ip, #0
00001840: ands r0, r0, ip, lsr #10
00001844: cmp r0, r8
00001848: bne 0x00001820
noCheckEntry:
0000184c:
(Note the interesting way we make 16r3fffff - moving NOT(0) into a register and shifting by 10 places right. There's a lot of that sort of thing going on to make ARM code efficient.)
This curious looking snippet of code provides a way that we can test the low 2 bits of an OOP for immediate tags (SmallInteger & Character are immediate objects in Spur) and then further check for SmallInteger as opposed to Character if required - and yet not worry about it otherwise.
We will end up with 0, 1 or the class bits of the object header in TempReg at the end of this chunk.
The 'noCheckEntry' label is used once the cog method is linked up to a caller that 'knows' the class is correct.
Primitives
This particular method doesn't reference a primitive so we'll skip the primitive handling code; it handles the calling of primitives by various means including calling plugin-prims and compiled-in-place assembler.
Frame building
Not all methods need to build a full stack frame but this case does and so we push
- the receiver
- any arguments
- the return address
- the frame pointer
- the method label
- nil into the context slo
- the receiver again
- nil into any temporary variable slots
and finally deal with the stack overflow check. In pidgin code this runs something like this -
push ReceiverResultReg
push Arg0Reg
pushAr1Reg (if used)
push return address
push FPReg
push methodLabel
push nil
push ReceiverResultReg
push nil (usually several times)
load stackLimit
compare stackLimit with SPReg
jump if lower to stackOverflow (see above)
... which for ARM looks like this
noCheckEntry:
0000184c: push {r7}
00001850: push {r4}
00001854: push {lr}
00001858: push {fp}
0000185c: mov fp, sp
00001860: sub ip, pc, #104 ; 0x68
00001864: push {ip}
IsAbsPCReference:
00001868: mov r6, #64, 18 ; 0x100000
0000186c: push {r6}
00001870: push {r7}
00001874: push {r6}
00001878: push {r6}
0000187c: ldr r0, [sl, #2048] ; 0x800 = 16r80000 = 'stackLimit'
00001880: cmp sp, r0
00001884: bcc 0x0000181c
Now, if you've been paying attention you will note that the stackOverflow case uses the same code as the inlineCacheMiss except that we force 0 into the RecevierResultRegister. This is a sneaky (And somewhat x86 biased, boo-hiss) way to let the abortTrampoline decide which subsidiary routine to branch to. It's not inconceivable that we may devise a simpler way to do this for the ARM.