Squeak
  links to this page:    
View this PageEdit this Page (locked)Uploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
Banyan
Last updated at 2:32 pm UTC on 13 July 2007
The Banyan File Utility provides useful functions for

Built on top of the existing FileDirectory/ServerDirectory included with Squeak 3.8 and 3.9, Banyan should be platform independent, but has been tested on the following:

Test cases are included.

License

By downloading Banyan, you agree to the following license.

Banyan.

Copyright 2006-2007 by Chris Muller.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Installation

Banyan itself is a one-click install from SqueakMap. However, if you will be using it to backup to/from an FTP server, you will need to load the bug fixes in the change-set attached to this bug:

http://bugs.squeak.org/view.php?id=6428

You may also download these same fixes here.

Banyan has been tested in Squeak 3.8 and 3.9.

The Backup Function

To perform a backup, the following interaction between user and software takes place:


Instantiating the Backup Instance

It's easy to instantiate the backup instance. Just:
	MaBanyanBackup
		source: sourceFileDirectoryOrServerDirectory
		target: targetFileDirectoryOrServerDirectory


As indicated, the source or target may transparently be an instance of FileDirectory or ServerDirectory, but at least one of them should be a FileDirectory, otherwise it would be very inefficient (and has not been tested anyway). If they are both FileDirectory's and OSProcess is loaded, Banyan will generate a copy script to perform the copying that the file timestamps will be preserved.

When using a ServerDirectory, it is important to specify keepAlive: true. Here is an example backing up the local filesystem to an FTP server:
  MaBanyanBackup
	source:
		(FileDirectory on: 'c:\Development\squeak')
	target:
		(ServerDirectory new
			type: #ftp ;
			server: 'someFtpServer' ;
			directory: '/home/chris/development/squeak' ;
			keepAlive: true ;
			user: 'chris' ;
			yourself)


Paring Unwanted Files/Directories

Sometimes you may wish to avoid backing up certain files, like temporary files or certain entire directories. This is very easy to accomplish via a where: block. It takes one argument which is the instance of MaDeltaDirectoryEntry. This gives great flexibility in the filtering because you have:

- an OrderedCollection of the entire path (relative to the source or target root) to that node (its #pathAndName)
- the entries within each root directory (source and target) that were present at that pathAndName, so you can compare names, dates, or sizes.

Example
Exclude the 'common Sub' directory:
	myBackup where:
		[ : eachDeltaEntry |
		(eachDeltaEntry pathAndName beginsWith: #('common Sub')) not ]


Exclude any files that begin with an underscore character:
	myBackup where:
		[ : eachDeltaEntry |
		eachDeltaEntry name first ~= $_ ]


Exclude any files whose sizes are equal:
	myBackup where:
		[ : eachDeltaEntry |
		eachDeltaEntry hasDifferences and: [ eachDeltaEntry source fileSize ~= eachDeltaEntry target fileSize ] ]


Preparing the Backup

After specifying the where: criteria (if necessary), there are special preparation methods which specify the backup-logic. The choices are, in order of most to least conservative:

prepareToMergeSourceIntoTargetOverwritingNothing Build my collection of operations that, when later executed, will merge the source structure into the target structure without deleting or replacing any existing files. This is an add-only operation, nothing in the target will be overwritten or deleted.
prepareToMergeSourceIntoTargetOverwritingOldest Build my list of operations that, when later executed, will perform a merge of the source structure into the target structure. Where the directory structures intersect, files from my target will be replaced by the newer files in the source. Files and directories present in the target that do not exist in the source are not touched.
prepareToMakeTargetLikeSource Prepare my list of operations that, when later executed, modify my target directory to look just like my source. Common files are replaced by those in the source only if their timeStamps are different (but a warning will be generated for common files newer in the target). Files and directories existing in the target that do not exist in the source are removed from the target. Files and directories existing in the source that do not exist in the target are copied over.

The end result is just as if the target is deleted in its entirety and then the source copied, except this is much more efficient since only the delta operations are performed.


View the Proposed Changes

At this point, nothing has been done to the filesystem. But you can view what will be done with either of #viewProposedTarget or #viewProposedChanges is used.

viewProposedTarget"Open a window showing my proposed targetDirectory, with changes emphasized within the context of the entire targetDirectory tree."

or

viewProposedChanges"Open a window displaying a report of the changes that will occur to my targetDirectory, excluding entries that will not be affected by this backup."

The screenshot below shows the results of #viewProposedTarget after selecting the #prepareToMakeTargetLikeSource option. Since the target will be made to be just like the source, see how it will delete the "target Only" stuff ("in Target Only", "target Only Sub" and "file1").

screenshot1.png

#viewProposedChanges is nicer for larger directories, because it will just show the entries being changed (and the full path to them), which can significantly reduce the size of the report.


Hand-remove Specific Operations

After viewing the backup, you may notice a particular entry that was not caught by the where: that may need to be removed. Rather than have to start over and special-case it, the #removeOperation: can be used.

Execute the Backup

When satisfied with what the backup will do, simply execte it:
	myBackup execute

After prompting if any warnings were present, the backup is immediately executed. Execution occurs in two steps; the non-copying operations followed by the copying operations. The reason for this is the chance that the copying operations can preserve the file timestamps. For this chance to be fulfilled, two conditions must be met:

1. OSProcess must be installed.
2. None of the source or target directories may be ServerDirectory's. They must all be FileDirectory's.

If both of these conditions are satisfied, Banyan will create a script to be executed by the operating system. The script generated is specific to the operating system on which it is running (which is determined by checking whether SmalltalkImage current platformName is 'Win32', if it is, it uses the MaBanyanWindowsStrategy otherwise the MaBanyanLinuxStrategy).

If one or both of the conditions are unsatisfied, Banyan will copy the files via Smalltalk (and FTP if a ServerDirectory is involved), which will cause the files to be stamped with the current time.

A Special Note about ServerDirectory

While creating Banyan, several bugs in ServerDirectory were discovered and fixed. For Banyan to work with ServerDirectory's, you must file in the fixes at:

http://bugs.squeak.org/view.php?id=6428

A Special Note about TimeZone

When using a ServerDirectory, most ftp servers will report timestamps in GMT time, not local time. Therefore, the above ServerDirectory fixes also include an extension to TimeZone which allows you to specify your #local time-zone and will convert timestamps received over FTP accordingly. This could be important for comparing file dates, so Banyan will warn you if you haven't set your time-zone to something other than GMT when using with a ServerDirectory.

A Special Note about Windows

There is no way in Windows to generate a non-interactive copy script unless all target files have their read-only flag set to false. Yes, xcopy has /R (overwrites read-only files), but it can't handle when the target file does NOT exist; it prompts you whether the target is a directory or a file. Ridiculous.

So, for performing Windows-to-Windows backups, it may be necessary to unset the targetDirectory's Read-only flag.


Understanding Space Utilization

Banyan can also generate a useful report showing space utilization similar to the du report in Linux, but more refined. To use it, just send #maOpenSizeTreeForEntriesLargerThan: to any FileDirectory or ServerDirectory.
	(FileDirectory on: 'c:\program files') maOpenSizeTreeForEntriesLargerThan: 10000000 "only show lines in the report larger than 10MB"


Here is a sample of that output:
139.3M Adobe
	139.3M Acrobat 7.0
		58.26M Reader
			31.02M plug_ins
		56.8M Setup Files
			34.37M RdrBig
				34.37M ENU
					29.32M Data1.cab
			22.43M RdrMin
				22.43M ENU
					17.67M Data1.cab
		17.21M Update
496.86M Common Files
	224.17M Microsoft Shared
		16.34M MODI
			16.34M 11.0
		11.61M MS Project
		12.24M Office10
		25.83M OFFICE11
			12.26M MSO.DLL
...


About the Model / How it Works

Banyan creates a tree in memory of the source and target directories and merges them into one in-memory tree, represented by a MaDeltaDirectory. Each node of this merged tree is a MaDeltaDirectoryEntry, which contains an OrderedCollection standard DirectoryEntry's. The OrderedCollection, with one element for each directory (usually two), corresponds to the source and target directories.

Example
Suppose we have the following source directory:
			source
				common File Common Age
				common Sub
					in Source Only
					common File Newer In Source
					common File Older In Source
				source Only Sub
					file1
				dirInSource fileInTarget


and the following target directory:
			target
				common File Common Age
				common Sub
					in Target Only
					common File Newer In Source
					common File Older In Source
				target Only Sub
					file1
				dirInSource fileInTarget


Banyan builds the merged tree based on the names of the file and directory entries in each directory ('source' and 'target'). So the merged tree would look like this:
				common File Common Age (source,target)
				common Sub (source,target)
					in Source Only (source,nil)
					in Target Only (nil,target)
					common File Newer In Source (source,target)
					common File Older In Source (source,target)
				source Only Sub (source,nil)
					file1 (source,nil)
				target Only Sub (nil,target)
					file1 (nil,target)
				dirInSource fileInTarget  (source,target)


The nodes of the tree are defined by the names of the entries in each directory. Where these names overlap, an entry for both the source and target is present (indicated above by source,target) in the DeltaDirectoryEntry. Where a name is present in only one of the source and target, nil is present in the slot for the other.

Banyan uses this merged tree to construct another in-memory tree of the FileOperations that will be executed. In this way, the entire effects of the backup can be observed before executing.

Just Comparing Directories

MaDeltaDirectory is the basis for the backup operations. It can compare any number of directories, not just two. For more information, see MaDeltaDirectory.

Rudimentary File Searching

MaBanyanFileSearch is a prototype for finding files matching any critieria. It has only one method in its API:

MaBanyanFileSearch >> #filesWhere:do:
"Traverse all the files and directories from my root and evaluate twoArgBlock for each path and entry satisfied by searchBlock. The arguments to searchBlock and twoArgBlock are the path of FileDirectory's leading to the second argument, the FileDirectoryEntry."

To use, create a MaBanyanFileSearch from the root FileDirectory where the search is to begin:
	mySearcher := MaBanyanFileSearch from: FileDirectory default

Then tell it to do something to files matching particular criteria:
	mySearcher
		filesWhere: [ : path : entries | ... ]
		do: [ : path : entries | ... ]

Examples
"Files that begin with an underscore character."
	mySearcher
		filesWhere:	
			[ : path : entries |
			entries name first ~= $_ ]
		do: [ : path : entries | ... ]


"Everything except the temp directory"
	myBackup where:
		[ : path : entries |
		path noneSatisfy: [ : eachFileDirectory | eachFileDirectory pathParts includes: 'temp' ] ]



Implementation Notes


Tree Union

Banyan gets its name from its likeness to the real tree of the same name; a tree that merges itself with another tree. The MaTree class makes merging trees and then selecting branches very easy.

Core FileDirectory and ServerDirectory extensions

The implementation of all these functions rely almost solely on two core extension methods to FileDirectory:

maDirectoryTreeDo: twoArgBlock path: anOrderedCollection
"Value twoArgBlock with the path (an OrderedCollection of FileDirectory's) to each DirectoryEntry and the DirectoryEntry itself."

and

maNameFor: filename relativeTo: containingDirectory
"Assume that filename is contained somewhere in the tree of containingDirectory. Answer the semi-qualified name for filename, qualified from containingDirectory down to its location."