COGClosedPICStructure
Last updated at 11:24 pm UTC on 3 November 2015
The beginning of some explanation of a Cog closed PIC
When a cogged method has a send that gets used for more than one target class of object we need a way to prevent constant yo-yoing between two or more method pointers getting (re)written into the send slot. What happens is that a Closed Polymorphic Inline Cache - a CPIC - is built and the original method send location is patched to indirect via this pseudo-method.
In sorta-Smalltalk it would look like-
cpicXXXXWithInitialTag: tag0 receiver: rx
literals := Array new:11. "filled by magic"
rx classTag = tag0 ifTrue:[ opReg := self loadLiteral: 1.
self jumpTo: target0].
1 to: 5 do: [:i|
rx classTag = (self loadLiteral: (2 * i)
ifTrue:[ opReg := self loadLiteral: (2 * i + 1).
^self jumpTo: self target: i]].
self loadAdddressOfThisPIC.
self jumpTo: picMissTrampoline
CPICs look very much like cog methods for the first few words (the header) and initial entry instructions. The same method header structure is used, albeit with some parts abused for different purposes, and the code section starts with very similar abort code.
picMNUAbort:Move pic abort discriminator value (0 by default) to ClassReg
picInterpretAbort:Call picAbortTrampoline
Just as with a cog method we follow this with the usual immediate test -
immediateTest: And 1 with TempReg
Jump to jumpCompare
... and entry code, with the same slightly twisted tag checking dance to get the class id into a temp register.
entry: And tagMask with ReceiverResultReg, result into TempReg
Jump if not 0 to immediateTest
Load lower 32bits of object header into TempReg
And classIndexMask with TempReg
jumpCompare: Compare ClassReg with TempReg
Jump if not equal to endCPICCase0
If the ClassReg (set in the calling cog method) matches TempReg we load a tag value literal and jump to the cog method that would have been the target initially, back when we could use a simple inline cache.
If we don't have a match, we continue to the next case, were we compare the class tag in TempReg to another literal encoded within the CPIC, and if it matches we can jump to a second target. If that doesn't match, we repeat.
Move first magic word-constant to SendNumArgsReg
Jump to first target cog method
endCPICCase0: Compare TempReg to second tag
Move a different magic word-constant into SendNumArgsReg
If the second tag matched, jump to second target
endCPICCase1:Compare TempReg to second tag
Move a different magic word-constant into SendNumArgsReg
If the second tag matched, jump to third target
...
endCPICCase5: Load CPIC address into ClassReg
Jump to pic miss trampoline
This test and jump sequence can continue for a total of six entries in the current design. If we run out of choices we set ClassReg to the base address of the CPIC and take a jump to a pic miss trampoline where Other Things can be done to solve the problem.
CPIC original implementation
Eliot's first implementation of CPICs made use of the Cog compilation machinery to build them as needed. Memory sufficient for a full 6 case CPIC was allocated and then the abort, entry, first case, second case and fail code compiled into that chunk of memory.
Each time another case had to be added the compilation machinery would fire up to compile the next case onto the end of the existing code, and a new copy of the fail code (since the earlier copy would have been over-written by the new case code).
This worked very well for the x86 cpu Cog but as soon as we started work on the ARM Cog there were some problems. The literals mentioned above for the class tags etc can be generated into an inline instruction form for x86 - i.e.
1260: cmpl $0xbabe1f17, %eax
1265: movl $0x0bada552, %ebx
but the ARM ISA cannot cleanly do that. We can use a sequence of MOV and ORR instructions but that tends to result in a lot of four-instruction sequences that need to be tracked and updated during compactions etc. A better approach is to store the literal values somewhere local but out of the execution stream and then load them using pc relative loads. We see this like so -
ldr ip, [pc, #84] ; 0x000017c0 16rBABE1F16
The detail of where the literals get stored is quite important. Unfortunately when first implementing CPICs with out of line literals we produced code that put them after each case's conditional branch to the target - and thus if the condition was not met and the branch not taken we tried to execute the literal values! Oddly this works on most ARM cpus since the literals are mostly only lower-20 bit values that execute as innocuous AND's of an unused register. Certain ARMs and the simulator consider it a gross offence and so we needed to change the code layout.
By way of illustration
00001774: ldr ip, [pc, #8] ; 0x000017c8 16rBABE1F17
00001778: cmp r0, ip
0000177c: ldr r6, [pc, #4] ; 0x000017cc 16rBADA552
00001780: beq 0x000a0da8
16rBABE1F17
16rBADA552
... more code
should be fairly self-evidently a problem.
Revised CPIC implementation
A very simple change could have been to add explicit jumps over the pairs of literals in each case. It would be a little slower and make the PICs a bit bigger.
However, we noticed that with the pattern of the code being very regular after the first case test there might be an opportunity to avoid compiling anew for each new CPIC and extension case. In the early stages of starting the system we have to build a prototype CPIC in order to record the size needed and establish some important offsets into the structure. Originally this was discarded immediately but now we save it and merely have to memcpy() that archetype to our newly allocated location in order to have the skeleton of a prepared CPIC in place.
To customise it for actual use we have to update
- the target of the abort code
- the target of the first case
- the class tag, operand and target of the second case
- the loading of the CPICs own base address
- and the target of the final miss handler
Since the prototype CPIC had all the cases filled in with dummy literals and targets we don't want to have to execute the not yet needed cases, nor to have wasteful conditional skipping code, so we take advantage of the conditional branch (See "Jump if not equal to endCPICCase0" above) used to jump over the first case to jump to the last case. That lets us simply continue to the miss handler when required. Now to add a new case we need only adjust the tag, operand and target of the 'spare' case just above that last one, and update the jump to take us to this new case. Various CPIC scanning methods needed small updates to handled this backwards growth but surprisingly little changed.
With the CPIC code fixed in size we had an obvious place to store the literals for the ARM, at the end of the code. With the final instruction being an unconditional jump to the miss handler we have no problems with strange un-instructions getting into the execution stream.
So now our initial two-case CPICs look like this -
abort & entry flim-flam ...
Compare ClassReg with TempReg
Jump if not equal to cPICCase1
Move first magic word-constant to SendNumArgsReg
Jump to first target cog method
cPICCase5:Compare TempReg to second tag
Move a different magic word-constant into SendNumArgsReg
If the sixth tag matched, jump to sixth target
cPICCase4:Compare TempReg to second tag
Move a different magic word-constant into SendNumArgsReg
If the fifth tag matched, jump to fifth target
cPICCase3:Compare TempReg to second tag
Move a different magic word-constant into SendNumArgsReg
If the fourth tag matched, jump to fourth target
cPICCase2:Compare TempReg to second tag
Move a different magic word-constant into SendNumArgsReg
If the third tag matched, jump to third target
cPICCase1: Compare TempReg to second tag
Move a different magic word-constant into SendNumArgsReg
If the second tag matched, jump to second target
endCPICCase1: Load CPIC address into ClassReg
Jump to pic miss trampoline
Literal List
...
When we extend a two-case CPIC to a third case the "Jump if not equal to cPICCase1" becomes "Jump if not equal to cPICCase2" and so on.
More:
http://www.mirandabanda.org/cogblog/category/cog/page/5/ and particularly the ‘Inline Caching’ section
Cog blog in general