Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
COGMethodStructure
Last updated at 4:05 am UTC on 7 June 2015
A machine code method in COG has a very defined structure, as you might expect. But it's not quite as you might expect since it starts before the beginning...

The basics

The definition of cog methods is provided by classes such as CogMethod, CogBlockMethod and Cogit (in the VMMaker.oscog package ) that work with the classes that support full simulation of the system within Squeak and with the Slang translation classes. Some minor Smalltalk inelegances are required to let Slang produce working C code. Life is hard.

A method consists of

Header

Let's look at an example header in the simulator -
1800
	objhdr: 8000000A000035
	nArgs: 1	type: 2
	blksiz: 2F8
	method: F4C570
	mthhdr: 80011
	selctr: 7530A0=#filePositionFromSourcePointer:
	blkentry: 0
	stackCheckOffset: 88/1888
	cmRefersToYoung: no

Broken up a bit for your elucidation -
objhdr: 8000000A000035 this is a 64bit chunk carefully crafted to convince the garbage collector that this is innocuous; all the cog methods should have the same value
nArgs: 1 type: 2 the number of arguments this method expects and the type code - 1-> free chunk, 2-> method, 3 -> block method, 4 -> closed PIC, 5 -> open PIC
blksiz: 2F8 size of the method in bytes, including the header and metadata. The size is rounded up to an 8 byte boundary
method: F4C570 oop of the Smalltalk CompiledMethod this is derived from
mthhdr: 80011 the header of the original method?
selctr: 7530A0=#filePositionFromSourcePointer: the oop of the selector for this method
blkentry: 0 offset to the block dispatch code at the end of the method; in this case there are no blocks to handle and so a 0 offset is used
stackCheckOffset: 88/1888 byte offset to the instruction after the stack overflow check code-snippet in the prelude. More later
cmRefersToYoung: no there is a list, and this method is not on it
When translated into the murky waters of C, this becomes
typedef struct {
	sqLong	objectHeader;
	unsigned		cmNumArgs : 8;
	unsigned		cmType : 3;
	unsigned		cmRefersToYoung : 1;
	unsigned		cpicHasMNUCase : 1;
	unsigned		cmUsageCount : 3;
	unsigned		cmUsesPenultimateLit : 1;
	unsigned		cbUsesInstVars : 1;
	unsigned		cmUnusedFlags : 2;
	unsigned		stackCheckOffset : 12;
	unsigned short	blockSize;
	unsigned short	blockEntryOffset;
	sqInt	methodObject;
	sqInt	methodHeader;
	sqInt	selector;
 } CogMethod;

Depending upon your C compiler there may well be some number of bytes after the real content of the C struct in order to meet platform alignment requirements. On ARM for example, it seems to enforce 8byte alignment and thus the sizeof(CogMethod) in C-land is 32 rather than the 28 used in sim-land.

Abort

The first code is the abort code that handles either a stack overflow (detected later in the code) or an inline cache miss. Conceptually this is
 
stackOverflow: Move 0 to ReceiverResultRegister
inlineCacheMiss: Call abortTrampolineFor nArgs

In ARM code this becomes -
0000181c: mov	r7, #0
00001820: push	{lr}
00001824: bl	0x00000700 = 16r700 = ceMethodAbort1Args

(Note the explicit push of 'lr' here; unlike the x86 an ARM doesn't push the return address from a call onto a stack. Much fun was had in working out how and when to push the ARM link register during the development of the ARMCog.)

Entry

Following this we have some entry code that varies a little depending upon the memory manager (Spur or the older V3 system) and whether or not we are running a NewSpeak system. Looking at a Spur Squeak system we get -
immediateTest: And 1 with TempReg
		Jump to jumpCompare
entry:	And tagMask with ReceiverResultReg, result into TempReg
		Jump if not 0 to immediateTest
		Load lower 32bits of object header into TempReg
		And classIndexMask with TempReg
jumpCompare: Compare ClassReg with TempReg
		Jump if not equal to inlineCacheMiss (see above)
noCheckEntry:

Again, in ARM code
00001828: ands	r0, r0, #1
0000182c: b	0x00001844
entry:
00001830: ands	r0, r7, #3
00001834: bne	0x00001828
00001838: ldr	r0, [r7]
0000183c: mvn	ip, #0
00001840: ands	r0, r0, ip, lsr #10
00001844: cmp	r0, r8
00001848: bne	0x00001820
noCheckEntry:
0000184c:

(Note the interesting way we make 16r3fffff - moving NOT(0) into a register and shifting by 10 places right. There's a lot of that sort of thing going on to make ARM code efficient.)
This curious looking snippet of code provides a way that we can test the low 2 bits of an OOP for immediate tags (SmallInteger & Character are immediate objects in Spur) and then further check for SmallInteger as opposed to Character if required - and yet not worry about it otherwise.
We will end up with 0, 1 or the class bits of the object header in TempReg at the end of this chunk.
The 'noCheckEntry' label is used once the cog method is linked up to a caller that 'knows' the class is correct.

Primitives

This particular method doesn't reference a primitive so we'll skip the primitive handling code; it handles the calling of primitives by various means including calling plugin-prims and compiled-in-place assembler.

Frame building

Not all methods need to build a full stack frame but this case does and so we push
and finally deal with the stack overflow check. In pidgin code this runs something like this -
push ReceiverResultReg
push Arg0Reg
pushAr1Reg (if used)
push return address
push FPReg
push methodLabel
push nil
push ReceiverResultReg
push nil (usually several times)
load stackLimit
compare stackLimit with SPReg
jump if lower to stackOverflow (see above)

... which for ARM looks like this
noCheckEntry:
0000184c: push	{r7}
00001850: push	{r4}
00001854: push	{lr}
00001858: push	{fp}
0000185c: mov	fp, sp
00001860: sub	ip, pc, #104	; 0x68
00001864: push	{ip}
IsAbsPCReference:
00001868: mov	r6, #64, 18	; 0x100000
0000186c: push	{r6}
00001870: push	{r7}
00001874: push	{r6}
00001878: push	{r6}
0000187c: ldr	r0, [sl, #2048]	; 0x800 = 16r80000 = 'stackLimit'
00001880: cmp	sp, r0
00001884: bcc	0x0000181c


Now, if you've been paying attention you will note that the stackOverflow case uses the same code as the inlineCacheMiss except that we force 0 into the RecevierResultRegister. This is a sneaky (And somewhat x86 biased, boo-hiss) way to let the abortTrampoline decide which subsidiary routine to branch to. It's not inconceivable that we may devise a simpler way to do this for the ARM.