Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
COGClosedPICStructure
Last updated at 11:24 pm UTC on 3 November 2015

The beginning of some explanation of a Cog closed PIC

When a cogged method has a send that gets used for more than one target class of object we need a way to prevent constant yo-yoing between two or more method pointers getting (re)written into the send slot. What happens is that a Closed Polymorphic Inline Cache - a CPIC - is built and the original method send location is patched to indirect via this pseudo-method.

In sorta-Smalltalk it would look like-
cpicXXXXWithInitialTag: tag0 receiver: rx
	literals := Array new:11. "filled by magic"
	rx classTag = tag0 ifTrue:[ opReg := self loadLiteral: 1.
		self jumpTo: target0].
	1 to: 5 do: [:i|
		rx classTag = (self loadLiteral: (2 * i) 
			ifTrue:[ opReg := self loadLiteral: (2 * i + 1).
				^self jumpTo: self target: i]].
	self loadAdddressOfThisPIC.
	self jumpTo: picMissTrampoline


CPICs look very much like cog methods for the first few words (the header) and initial entry instructions. The same method header structure is used, albeit with some parts abused for different purposes, and the code section starts with very similar abort code.
picMNUAbort:Move pic abort discriminator value (0 by default) to ClassReg
picInterpretAbort:Call picAbortTrampoline

Just as with a cog method we follow this with the usual immediate test -
 immediateTest: And 1 with TempReg
		Jump to jumpCompare

... and entry code, with the same slightly twisted tag checking dance to get the class id into a temp register.
entry:	And tagMask with ReceiverResultReg, result into TempReg
		Jump if not 0 to immediateTest
		Load lower 32bits of object header into TempReg
		And classIndexMask with TempReg
jumpCompare: Compare ClassReg with TempReg
		Jump if not equal to endCPICCase0

If the ClassReg (set in the calling cog method) matches TempReg we load a tag value literal and jump to the cog method that would have been the target initially, back when we could use a simple inline cache.
If we don't have a match, we continue to the next case, were we compare the class tag in TempReg to another literal encoded within the CPIC, and if it matches we can jump to a second target. If that doesn't match, we repeat.
		Move first magic word-constant to SendNumArgsReg
		Jump to first target cog method
endCPICCase0: Compare TempReg to second tag
		Move a different magic word-constant into SendNumArgsReg
		If the second tag matched, jump to second target
endCPICCase1:Compare TempReg to second tag
		Move a different magic word-constant into SendNumArgsReg
		If the second tag matched, jump to third target
...
endCPICCase5: Load CPIC address into ClassReg
		Jump to pic miss trampoline

This test and jump sequence can continue for a total of six entries in the current design. If we run out of choices we set ClassReg to the base address of the CPIC and take a jump to a pic miss trampoline where Other Things can be done to solve the problem.

CPIC original implementation

Eliot's first implementation of CPICs made use of the Cog compilation machinery to build them as needed. Memory sufficient for a full 6 case CPIC was allocated and then the abort, entry, first case, second case and fail code compiled into that chunk of memory.
Each time another case had to be added the compilation machinery would fire up to compile the next case onto the end of the existing code, and a new copy of the fail code (since the earlier copy would have been over-written by the new case code).
This worked very well for the x86 cpu Cog but as soon as we started work on the ARM Cog there were some problems. The literals mentioned above for the class tags etc can be generated into an inline instruction form for x86 - i.e.
1260: cmpl $0xbabe1f17, %eax
	1265: movl $0x0bada552, %ebx

but the ARM ISA cannot cleanly do that. We can use a sequence of MOV and ORR instructions but that tends to result in a lot of four-instruction sequences that need to be tracked and updated during compactions etc. A better approach is to store the literal values somewhere local but out of the execution stream and then load them using pc relative loads. We see this like so -
ldr	ip, [pc, #84]	; 0x000017c0 16rBABE1F16

The detail of where the literals get stored is quite important. Unfortunately when first implementing CPICs with out of line literals we produced code that put them after each case's conditional branch to the target - and thus if the condition was not met and the branch not taken we tried to execute the literal values! Oddly this works on most ARM cpus since the literals are mostly only lower-20 bit values that execute as innocuous AND's of an unused register. Certain ARMs and the simulator consider it a gross offence and so we needed to change the code layout.
By way of illustration
00001774: ldr	ip, [pc, #8]	; 0x000017c8 16rBABE1F17
00001778: cmp	r0, ip
0000177c: ldr	r6, [pc, #4]	; 0x000017cc 16rBADA552
00001780: beq	0x000a0da8
16rBABE1F17
16rBADA552
... more code
should be fairly self-evidently a problem.

Revised CPIC implementation

A very simple change could have been to add explicit jumps over the pairs of literals in each case. It would be a little slower and make the PICs a bit bigger.
However, we noticed that with the pattern of the code being very regular after the first case test there might be an opportunity to avoid compiling anew for each new CPIC and extension case. In the early stages of starting the system we have to build a prototype CPIC in order to record the size needed and establish some important offsets into the structure. Originally this was discarded immediately but now we save it and merely have to memcpy() that archetype to our newly allocated location in order to have the skeleton of a prepared CPIC in place.
To customise it for actual use we have to update
Since the prototype CPIC had all the cases filled in with dummy literals and targets we don't want to have to execute the not yet needed cases, nor to have wasteful conditional skipping code, so we take advantage of the conditional branch (See "Jump if not equal to endCPICCase0" above) used to jump over the first case to jump to the last case. That lets us simply continue to the miss handler when required. Now to add a new case we need only adjust the tag, operand and target of the 'spare' case just above that last one, and update the jump to take us to this new case. Various CPIC scanning methods needed small updates to handled this backwards growth but surprisingly little changed.
With the CPIC code fixed in size we had an obvious place to store the literals for the ARM, at the end of the code. With the final instruction being an unconditional jump to the miss handler we have no problems with strange un-instructions getting into the execution stream.

So now our initial two-case CPICs look like this -
		abort & entry flim-flam ...
		Compare ClassReg with TempReg
		Jump if not equal to cPICCase1
		Move first magic word-constant to SendNumArgsReg
		Jump to first target cog method
cPICCase5:Compare TempReg to second tag
		Move a different magic word-constant into SendNumArgsReg
		If the sixth tag matched, jump to sixth target
cPICCase4:Compare TempReg to second tag
		Move a different magic word-constant into SendNumArgsReg
		If the fifth tag matched, jump to fifth target
cPICCase3:Compare TempReg to second tag
		Move a different magic word-constant into SendNumArgsReg
		If the fourth tag matched, jump to fourth target
cPICCase2:Compare TempReg to second tag
		Move a different magic word-constant into SendNumArgsReg
		If the third tag matched, jump to third target
cPICCase1: Compare TempReg to second tag
		Move a different magic word-constant into SendNumArgsReg
		If the second tag matched, jump to second target
endCPICCase1: Load CPIC address into ClassReg
		Jump to pic miss trampoline
Literal List
...

When we extend a two-case CPIC to a third case the "Jump if not equal to cPICCase1" becomes "Jump if not equal to cPICCase2" and so on.

More:

http://www.mirandabanda.org/cogblog/category/cog/page/5/ and particularly the ‘Inline Caching’ section
Cog blog in general