FAQ: FileOut Code Format
Last updated at 3:53 pm UTC on 2 July 2018
A "file out" is a stream of octets used to store Smalltalk source code or objects into files or to transport them from one Smalltalk system to another. One can fileout source code, code patches or even morphs.
These files typically end with the suffix ".st". When "filed in" the expressions in the file are actually evaluated using the Compiler - this means that the file can also contain arbitrary code that will be executed when filed in. The FileList tool allows one to preview the contents or browse them before filing them in.
The other dominant format for distributing Squeak source code in files is ChangeSet, which uses the same file format but with a .sources or .cs extension. The source code of a Squeak image main release is stored separately in file with .sources extension file using this format. Any code updates to this release is preserved in a corresponding file with .changes extension. For instance, the source code of Squeak 5.0 release is held in SqueakV50.sources file. Subsequent updates to a specific image are saved in its corresponding .changes file.
Changesets may be viewed using Change Sorter tools in Squeak. These tools allow you to browse or edit code segments, move them from one changeset to another and so on.
Confusion with Text Files
A fileout often contains printable ASCII characters in them, as when filing out method source code. This leads some to treat them like text files. Even when filing out source code, one should not treat these fileouts like text files. This is because they always use CR to end lines which may differ from that for the host platform. Native editors on MAC OS work fine because they also use CR for line ending. But Windows uses CRLF and most Unix-derived systems use LF. If you try to create or edit these files on these platforms that use a different line ending code, the resulting file may be rejected as corrupt by Squeak.
Tips for Unix or GNU/Linux users:
- With vim editor, add "set fileformats+=mac" option in its rc file to preserve the line endings. You may then use view/vim to browse these files.
- In scripts, you may use the filter tr '\r' '\n' to convert line separator from CR to LF before piping the contents to a text utility like grep
What is the precise syntax for filein/fileout?
A fileout consists of a sequence of code segments called "chunks" separated by ! character. Any ! character occurring within the code segment must be doubled to form a chunk. An empty chunk consisting of one or more whitespace characters terminates the sequence. These sequence takes one of two forms
In the first form, each chunk is read in by the compiler and evaluated immediately. You may think of ! as a "doIt" operator. In the examples below, the first line is a string header typical of a changeset. Notice how embedded single quotes are escaped. The second line shows a class being defined.
- <chunk>!<chunk>! ... !<ws>!
'From Squeak5.2alpha of 28 June 2018 [latest update: #18120] on 30 June 2018 at 5:17:08 pm'!
Object subclass: #ImageSegmentLoader
instanceVariableNames: 'segment outPointers oopMap position differentEndian '
classVariableNames: 'BytesInHeader CompactClasses HeaderTypeClass HeaderTypeFree HeaderTypeMask HeaderTypeShort HeaderTypeSizeAndClass '
category: 'System-Object Storage'!
In the second form which starts with a ! character, the following chunk is evaluated to get a "chunk reader" object, which is then sent the #scanFrom: message. The #scanFrom: method then reads rest of the stream. Only when it returns, you are on the top-level again. This form allows custom chunk readers to be used to pull in any type of octets - ASCII or otherwise. The ClassCategoryReader files in methods (see first line in the example below), while chunk readers like ObjectScanner provide a compiler context for filing in SmartRefStream containing serialized object graphs (see from second line in the example below). Look at the implementors of #scanFrom: message.
!ImageSegmentLoader methodsFor: 'reading' stamp: 'kks 6/30/2018 16:11'!
| word |
word := segment at: addr // 4.
^word digitAt: (differentEndian ifTrue:  ifFalse: ).! !
!ObjectScanner new initialize!
!self smartRefStream! ... binary bytes ...
You can see how chunks work by looking at senders and implementors of:
PositionableStream, WriteStream and ReadWriteStream have a method category "fileIn/Out".
- and a few variants.
Look at methods whose selector contains: "filein" or "fileout" using the Method Finder.