Last updated at 4:18 pm UTC on 3 June 2006
Note: Andreas has indicated that as of June 2006, this isn't the way things are done. For more details, you'll have to see the source.
[From a message by Andreas Raab, on 12 Feb 2000]
I'll have to give some background first so you understand why this is actually necessary before I'll go on and explain what exactly we're going to do.
As Dan mentioned this story goes all the way back to Tim's original LEBB (little endian BitBlt) design. There are many reasons for wanting to have an LEBB version of BitBlt but my primary reason is that of 3D graphics accelleration. However, one of the unique features of Squeak is the embedded nature of 3D - it doesn't matter where your 3D object is it can be behind or in front or inbetween 2D objects all of which are ultimately rendered by BitBlt.
Even though I really wanted to get 3D hardware accelleration going I didn't want to give up this flexibility. In terms of low level graphics this means we (e.g., BitBlt) needs to be able to render to the very same surface that's used for graphics accelleration. [ObBesides: This feature is present at least on Windows, Mac, and Linux machines with the 'right' kind of graphics hardware underneath].
You may ask, what's the problem here?! Obviously BitBlt can render to a variety of Forms (1,2,4,8,16,32 bit deep) and so why do we need a different BitBlt?! The first part of the answer is that the current BitBlt always assumes that Pixels are aligned MSB first. That way, we have a unique (e.g., bit identical) representation across all platform but it comes at a high cost for those platforms that are not MSB based. As an example, assume the following forms:
Form extent: 4@1 depth: 8
| pixel1 | pixel2 | pixel3 | pixel4 |
Form extent: 2@1 depth: 16
| pixel1 | pixel2 |
The above is what BitBlt sees logically and is what an MSB system sees physically (e.g., the memory layout). For an LSB system, however, the physical layout in memory looks like
Form extent: 4@1 depth: 8
| pixel4 | pixel3 | pixel2 | pixel1 |
Form extent: 2@1 depth: 16
| pixel2 | pixel1 |
Thus, if these pixels are drawn as is (e.g., directly from memory to the display) you'd get the pixels at the wrong positions on the screen. Any LSB VM therefore translates the contents of Display before copying it onto screen, byte or word reversing as necessary. This has a number of implications: It does require an extra pass over display memory (which, in the case of 1024x768x16 touches about 1.5MB of data for Display>>forceToScreen) and means that you can't use any OS surfaces directly (such as on PDAs where one might have access to the display bits directly)since the pixel positions need to be fixed up. The same happens for the mix of 2D and 3D in which case the 3D display would use the 'native pixel positions' whereas the 2D part from BitBlt would use 'Squeak pixel positions' (I'm using these terms in lack of a better distinction).
The second part of the answer are color spaces. While RGBA is the most common color space there are zillions of different flavours of RGBA representations. Take, for instance, 16bit RGBA mode: I have seen at least the following variants of it:
Therefore, if we want to talk to the OS in terms of the native color representation we need to be very flexible in the actual color space representation.
- 5x5x5 RGB
- 5x5x5 BGR
- 1x5x5x5 ARGB
- 5x5x5x1 BGRA
- 1x5x5x5 ABGR
- 5x5x5x1 RGBA
- 5x6x5 RGB
- 4x4x4x4 RGBA [inclusive all of the (A)RGB(A)/(A)BGR(A) permutations]
Okay, after it's clear why we need to deal with those two aspects (endianess and color spaces) here is how the actual approach.
Endianess was relatively simple (since I didn't have to do it ;-) Tim had done 95% of this already in his LEBB implementation and I've basically just copied his stuff. There is one question which is not yet completely answered and that is how to represent different endianess at the image level. We will introduce a class Pixmap which represents a platform dependent array of pixels (in contrast to Bitmap which will stay in canonical MSB format) so that a distinction between native and canonical representation is possible. There are a small number of problems to solve but I'm quite positive that we can figure those out.
[ObWarning: Be warned. If you are performing operations on the bits of a form directly you might have a couple of surprises in the future. The plan is to provide canonical accessors to the contents of a Bitmap/Pixmap that will always return the pixel value in canonical order but there may be some yet-to-discover unpleasant surprises (e.g., using 'aForm bits basicAt: index' is a Very Bad Idea)]
Color conversion will be handled by a generalized version of color maps. In contrast to the current implementation treating color maps as lookup tables only and performing implicit color reduction and expansion (e.g., for 16 <-> 32 bit representations) the new color maps consist of two parts. An indexable part, describing a lookup table and a fixed part describing the color space conversion. The fixed part is actually four shifts and four masks which can be used to expand, contract, or exchange color components individually.
In addition to the generalized color map representation there are two additional maps: an (optional) source map and an (optional) destination map. Together with the general colorMap those three maps define the most general pixel mapping operation in BitBlt as:
- Fetch a source pixel, map it by source map (incl color space conversion and table lookup)
- Fetch a destination pixel, map it by dest map (incl color space conversion and table lookup)
- Combine these two mapped pixels using a combination rule
- Map the resulting pixel using the color map (incl color space conversion and table lookup)
- Store the resulting pixel in the destination
As you can guess, this generalized loop also includes quite a speed penalty. Because of this the source and destination maps are optional, and BitBlt can perfectly operate with just a colorMap or even without any map whatsoever.
The primary difference between having those additional maps or not is the operational mode. If BitBlt has no or just a single colorMap it will operate on pixel words, trying to run the entire operation as fast as possible (e.g., in eight bit depth it will use four pixels at once). In contrast, if all the maps are given, it will operate on single pixels. The assumption for this is that if source, dest, and color map are given, source and dest maps generate a canonical 32bit ARGB color value which is inversely mapped into the required destination depth by the colorMap (note that this mapping is not required - it's quite possible to map into other color spaces/depths performing certain per-pixel operations). This, for instance, allows to use alpha blends in arbitrary depths.
[ObBesides: The fixed parts of the color maps are really just four shifts and masks. It's quite possibly to do some really weird stuff with that, I just haven't found any good application for it ;-)]
[ObBesides2: Those of you who are really deeply into BitBlt will note that there are some modes (in particular Form>>paint) which cannot trivially be kept the way they are when color space conversions come into play. And you are right. One of the major modifications in BitBlt is the generalization of Form>>paint into general source and destination keying but this is going to far for now.]
[ObBesides3: The issue of the relation between forms and palettes is not yet completely clear. I am thinking about giving every form a palette so that it's trivial to know about and perform any color conversion as necessary but there are a couple of tricky issues yet to be figured out.]
Okay, what's the outcome of all of this?! We'll be able to do very generic color space conversion within BitBlt, which would, for instance, allow us to render to a 4x4x4x4 RGBA texture which then in turn could be used for any 3D operation. Kinda cool.
Also, it will avoid the additional pass on reversing bytes or words during display updates - an effect which is likely to speed up display updates in general on slower machines (likes PDAs).
Also, it will enable us to use OS surfaces directly, thus avoiding the double memory allocation from Squeak's Display and any extra OS bitmap (currently happening on Unix and WinCE machines).
And finally, it will allow us to talk to the underlying OS graphics subsystem in the most efficient manner possible. Depending on your hardware we may directly talk to the VRAM of your graphics card or exploit the parallel execution of memory-to-card blts while having Squeak do something else inbetween. Lot's of cool stuff.
A note on speed
Those of you who are concerned about the speed implications of the above, my preliminary measures have shown that the new generalized BitBlt is about 10-20% slower in the complex cases (which is pretty good considering the recent speedups achieved in the current BitBlt implementation). And then, there is always the chance to hook in a JitBlt implementation - do I need to mention that the hooks are already in place?! ;-)
I think it's necessary to give the proper acknowledgement to Tim Rowledge. I've been hesitating to do all the color conversion stuff but after 'The Treaty of Denver' (as we called it) Tim has been so persistant that I finally realized I just need to do it (ya all know what a pain in the neck he can be if he wants to ;-)