Module Files discussion
Last updated at 2:35 pm UTC on 16 January 2006
A changes file for each module
> If I load ThingLab,
> i want to be able to browse the code (I'm thinking .changes file), but if I
> unload it, I don't want to be stuck with all that in my base .changes file. (Module Files discussion)
I agree about the .changes file, but this is a matter where I'd be happy to hand the design to you. I know just very little about the source file array and compiled method source tags, but I can imagine a disadvantage in having an open changes file for every or many modules. (A semaphore-controlled SourceReader that opens and closes the different files lazily, one at a time–or a cache of a few–could be fast enough.)
I've thought the we should maintain a local copy of the remote repository in the local image folder, mirroring the remote structure, where we put a copy of every file downloaded more or less.
I've been thinking more about file formats. Here are my thoughts so far – perhaps you have already thought more about it...
I think of there being a couple of files associated with each module:
- Module.seg - This is basically an exported imageSegment that can be loaded quickly.
- Module.sources - This is the source code for the base version of the module, analogous to SqueakVN.sources
- Module.changes - This is source code for any changes made to this module since the base version, whether on the repository, or carried out locally.
Hey, guess what, it's just like the Squeak image, but broken down into modules.
The .seg file is purely an optimization. Such a file could be loaded 20-50 times faster than compiling the source, and it can be loaded into an image with no compiler, too. I got fairly far along (without modules) making this work so you could load, eg, 3D into a small Squeak if you tried to load a Wonderland project. However I would skip this completely for the first month.
The .sources file includes not only the source code, but everything needed to install the package properly. Plus, probably, some header info about version, imports and exports. Finally, It would be nice if this could be compressed. Maybe it gets decompressed on first use, or maybe we write a little interface so that source code browsing works on .gz files (not that hard, I think. A?).
The .changes file is pretty much what you would expect.
Note that if you move a class from one module to another, you've got to copy its source code as well. This could be done lazily, but anything that gets detached from the image needs the module source code to be complete.
> Hey, guess what, it's just like the Squeak image, but broken down into
This is all good–nice and simple. We might want to choose the "Module" part of the names well. I'd go for "KernelObjects_1.0.type", ie. full path and version except be careful about the Mac 31 char limit, counting 8 for the type (".sources"), and cut off the middle, using Objects_1.0014.type as the last way out for really long names. It's a method you just write once and it's done.
> The .seg file is purely an optimization.
> However I would skip this completely for the first month.
> The .sources file includes not only the source code, but everything needed to
> install the package properly. Plus, probably, some header info about version,
> imports and exports.
I have thought that module definition, basically holding the info that is in the module object, should be in a separate small file. In this way you can resolve all module dependencies, locate all module sources, and set up the module objects, before installing any code. In this way you do things in well-defined steps, so you'll get any setup errors before starting to file in code, such as declaration conflicts, not finding a module, or similar things.
In this way I guess the sources file would be just source code, like now.
> Finally, It would be nice if this could be compressed.
> Maybe it gets decompressed on first use, or maybe we write a little interface
> so that source code browsing works on .gz files (not that hard, I think. A?).
It could be uncompressed before being stored locally.
> The .changes file is pretty much what you would expect.
> Note that if you move a class from one module to another, you've got to copy
> its source code as well. This could be done lazily, but anything that gets
> detached from the image needs the module source code to be complete. hg
> Thinking about information, it might make more sense to abbreviate the type,
> like .src, .chg, .seg, so that we have as much as possible (this would be 4
> more) for the more important parts of the name.
Good idea. It might be an idea to make this uniform with the main sources and changes files right away, so that they recognize the old, long ones but use the new ones. If we want to separate the old, big files from the corresponding module files, I think it shouldn't be in whether the extension is full or abbreviated, or am I too picky now?
Meta-tags in files
We may want to safeguard these files from being used in the wrong way by mistake. I like the idea of putting a meta-tag at the beginning, like:
- (... path and version ...)
It shouldn't have any side-effect except for verification by the fileIn mechanism, and ideally it should cause an exception if Squeak somehow tries to use it as a usual source/changes file (harder?). A literal array would be good: the reader could fileIn a chunk, and if it isn't an array then it will bail out. Having a message there would allow a hacker to put an evil expression there to wreck the system when it is evaluated.
I like the idea that people should be able to fool around with the contents of their own repository without breaking things in a bad way, and to understand it by looking at the things that are there.
A similar safety check solution would be even more desirable in .seg files since they are much harder to figure out or debug "manually" if this go wrong, ie. by looking at their contents. But I don't understand the format well enough to say how to do it. hg