Banyan
Last updated at 2:32 pm UTC on 13 July 2007
The Banyan File Utility provides useful functions for
- easily transferring / backing up work from one personal computer to another, or from/to any FTP server.
- understanding space utilization.
- comparing any number of directories (2 or more) simultaneously.
- file-management convenience.
Built on top of the existing FileDirectory/ServerDirectory included with Squeak 3.8 and 3.9, Banyan should be platform independent, but has been tested on the following:
- Linux to Linux
- Windows to Windows
- Windows to Linux (executing in Windows)
- Linux to Windows (executing in Windows)
- Linux to AIX (executing in Linux)
Test cases are included.
License
By downloading Banyan, you agree to the following license.
Banyan.
Copyright 2006-2007 by Chris Muller.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Installation
Banyan itself is a one-click install from SqueakMap. However, if you will be using it to backup to/from an FTP server, you will need to load the bug fixes in the change-set attached to this bug:
http://bugs.squeak.org/view.php?id=6428
You may also download these same fixes here.
Banyan has been tested in Squeak 3.8 and 3.9.
The Backup Function
To perform a backup, the following interaction between user and software takes place:
- A MaBanyanBackup is instantiated, defining the source and target directories.
- Optionally, the backup is pared with #where: criteria.
- The backup is #prepared according to one of three strategies: merge-overwriting-nothing, merge-overwriting-oldest, or make-target-like-source. Banyan generates an internal tree of FileOperations.
- Optionally, the user may view the proposed changes about to occur.
- Optionally, particular operations for the backup may be hand-removed.
- When ready, the user executes the backup. This execution work will be performed by classes in the image, or by generating a script and invoking it with OSProcess, in which case the file timestamps will be preserved.
Instantiating the Backup Instance
It's easy to instantiate the backup instance. Just:
MaBanyanBackup
source: sourceFileDirectoryOrServerDirectory
target: targetFileDirectoryOrServerDirectory
As indicated, the source or target may transparently be an instance of FileDirectory or ServerDirectory, but at least one of them should be a FileDirectory, otherwise it would be very inefficient (and has not been tested anyway). If they are both FileDirectory's and OSProcess is loaded, Banyan will generate a copy script to perform the copying that the file timestamps will be preserved.
When using a ServerDirectory, it is important to specify keepAlive: true. Here is an example backing up the local filesystem to an FTP server:
MaBanyanBackup
source:
(FileDirectory on: 'c:\Development\squeak')
target:
(ServerDirectory new
type: #ftp ;
server: 'someFtpServer' ;
directory: '/home/chris/development/squeak' ;
keepAlive: true ;
user: 'chris' ;
yourself)
Paring Unwanted Files/Directories
Sometimes you may wish to avoid backing up certain files, like temporary files or certain entire directories. This is very easy to accomplish via a where: block. It takes one argument which is the instance of MaDeltaDirectoryEntry. This gives great flexibility in the filtering because you have:
- an OrderedCollection of the entire path (relative to the source or target root) to that node (its #pathAndName)
- the entries within each root directory (source and target) that were present at that pathAndName, so you can compare names, dates, or sizes.
Example
Exclude the 'common Sub' directory:
myBackup where:
[ : eachDeltaEntry |
(eachDeltaEntry pathAndName beginsWith: #('common Sub')) not ]
Exclude any files that begin with an underscore character:
myBackup where:
[ : eachDeltaEntry |
eachDeltaEntry name first ~= $_ ]
Exclude any files whose sizes are equal:
myBackup where:
[ : eachDeltaEntry |
eachDeltaEntry hasDifferences and: [ eachDeltaEntry source fileSize ~= eachDeltaEntry target fileSize ] ]
Preparing the Backup
After specifying the where: criteria (if necessary), there are special preparation methods which specify the backup-logic. The choices are, in order of most to least conservative:
prepareToMergeSourceIntoTargetOverwritingNothing | Build my collection of operations that, when later executed, will merge the source structure into the target structure without deleting or replacing any existing files. This is an add-only operation, nothing in the target will be overwritten or deleted. |
prepareToMergeSourceIntoTargetOverwritingOldest | Build my list of operations that, when later executed, will perform a merge of the source structure into the target structure. Where the directory structures intersect, files from my target will be replaced by the newer files in the source. Files and directories present in the target that do not exist in the source are not touched. |
prepareToMakeTargetLikeSource | Prepare my list of operations that, when later executed, modify my target directory to look just like my source. Common files are replaced by those in the source only if their timeStamps are different (but a warning will be generated for common files newer in the target). Files and directories existing in the target that do not exist in the source are removed from the target. Files and directories existing in the source that do not exist in the target are copied over. The end result is just as if the target is deleted in its entirety and then the source copied, except this is much more efficient since only the delta operations are performed. |
View the Proposed Changes
At this point, nothing has been done to the filesystem. But you can view what will be done with either of #viewProposedTarget or #viewProposedChanges is used.
viewProposedTarget | "Open a window showing my proposed targetDirectory, with changes emphasized within the context of the entire targetDirectory tree." |
or
viewProposedChanges | "Open a window displaying a report of the changes that will occur to my targetDirectory, excluding entries that will not be affected by this backup." |
The screenshot below shows the results of #viewProposedTarget after selecting the #prepareToMakeTargetLikeSource option. Since the target will be made to be just like the source, see how it will delete the "target Only" stuff ("in Target Only", "target Only Sub" and "file1").
#viewProposedChanges is nicer for larger directories, because it will just show the entries being changed (and the full path to them), which can significantly reduce the size of the report.
Hand-remove Specific Operations
After viewing the backup, you may notice a particular entry that was not caught by the where: that may need to be removed. Rather than have to start over and special-case it, the #removeOperation: can be used.
Execute the Backup
When satisfied with what the backup will do, simply execte it:
myBackup execute
After prompting if any warnings were present, the backup is immediately executed. Execution occurs in two steps; the non-copying operations followed by the copying operations. The reason for this is the chance that the copying operations can preserve the file timestamps. For this chance to be fulfilled, two conditions must be met:
1. OSProcess must be installed.
2. None of the source or target directories may be ServerDirectory's. They must all be FileDirectory's.
If both of these conditions are satisfied, Banyan will create a script to be executed by the operating system. The script generated is specific to the operating system on which it is running (which is determined by checking whether SmalltalkImage current platformName is 'Win32', if it is, it uses the MaBanyanWindowsStrategy otherwise the MaBanyanLinuxStrategy).
If one or both of the conditions are unsatisfied, Banyan will copy the files via Smalltalk (and FTP if a ServerDirectory is involved), which will cause the files to be stamped with the current time.
A Special Note about ServerDirectory
While creating Banyan, several bugs in ServerDirectory were discovered and fixed. For Banyan to work with ServerDirectory's, you must file in the fixes at:
http://bugs.squeak.org/view.php?id=6428
A Special Note about TimeZone
When using a ServerDirectory, most ftp servers will report timestamps in GMT time, not local time. Therefore, the above ServerDirectory fixes also include an extension to TimeZone which allows you to specify your #local time-zone and will convert timestamps received over FTP accordingly. This could be important for comparing file dates, so Banyan will warn you if you haven't set your time-zone to something other than GMT when using with a ServerDirectory.
A Special Note about Windows
There is no way in Windows to generate a non-interactive copy script unless all target files have their read-only flag set to false. Yes, xcopy has /R (overwrites read-only files), but it can't handle when the target file does NOT exist; it prompts you whether the target is a directory or a file. Ridiculous.
So, for performing Windows-to-Windows backups, it may be necessary to unset the targetDirectory's Read-only flag.
Understanding Space Utilization
Banyan can also generate a useful report showing space utilization similar to the du report in Linux, but more refined. To use it, just send #maOpenSizeTreeForEntriesLargerThan: to any FileDirectory or ServerDirectory.
(FileDirectory on: 'c:\program files') maOpenSizeTreeForEntriesLargerThan: 10000000 "only show lines in the report larger than 10MB"
Here is a sample of that output:
139.3M Adobe
139.3M Acrobat 7.0
58.26M Reader
31.02M plug_ins
56.8M Setup Files
34.37M RdrBig
34.37M ENU
29.32M Data1.cab
22.43M RdrMin
22.43M ENU
17.67M Data1.cab
17.21M Update
496.86M Common Files
224.17M Microsoft Shared
16.34M MODI
16.34M 11.0
11.61M MS Project
12.24M Office10
25.83M OFFICE11
12.26M MSO.DLL
...
About the Model / How it Works
Banyan creates a tree in memory of the source and target directories and merges them into one in-memory tree, represented by a MaDeltaDirectory. Each node of this merged tree is a MaDeltaDirectoryEntry, which contains an OrderedCollection standard DirectoryEntry's. The OrderedCollection, with one element for each directory (usually two), corresponds to the source and target directories.
Example
Suppose we have the following source directory:
source
common File Common Age
common Sub
in Source Only
common File Newer In Source
common File Older In Source
source Only Sub
file1
dirInSource fileInTarget
and the following target directory:
target
common File Common Age
common Sub
in Target Only
common File Newer In Source
common File Older In Source
target Only Sub
file1
dirInSource fileInTarget
Banyan builds the merged tree based on the names of the file and directory entries in each directory ('source' and 'target'). So the merged tree would look like this:
common File Common Age (source,target)
common Sub (source,target)
in Source Only (source,nil)
in Target Only (nil,target)
common File Newer In Source (source,target)
common File Older In Source (source,target)
source Only Sub (source,nil)
file1 (source,nil)
target Only Sub (nil,target)
file1 (nil,target)
dirInSource fileInTarget (source,target)
The nodes of the tree are defined by the names of the entries in each directory. Where these names overlap, an entry for both the source and target is present (indicated above by source,target) in the DeltaDirectoryEntry. Where a name is present in only one of the source and target, nil is present in the slot for the other.
Banyan uses this merged tree to construct another in-memory tree of the FileOperations that will be executed. In this way, the entire effects of the backup can be observed before executing.
Just Comparing Directories
MaDeltaDirectory is the basis for the backup operations. It can compare any number of directories, not just two. For more information, see MaDeltaDirectory.
Rudimentary File Searching
MaBanyanFileSearch is a prototype for finding files matching any critieria. It has only one method in its API:
MaBanyanFileSearch >> #filesWhere:do:
"Traverse all the files and directories from my root and evaluate twoArgBlock for each path and entry satisfied by searchBlock. The arguments to searchBlock and twoArgBlock are the path of FileDirectory's leading to the second argument, the FileDirectoryEntry."
To use, create a MaBanyanFileSearch from the root FileDirectory where the search is to begin:
mySearcher := MaBanyanFileSearch from: FileDirectory default
Then tell it to do something to files matching particular criteria:
mySearcher
filesWhere: [ : path : entries | ... ]
do: [ : path : entries | ... ]
Examples
"Files that begin with an underscore character."
mySearcher
filesWhere:
[ : path : entries |
entries name first ~= $_ ]
do: [ : path : entries | ... ]
"Everything except the temp directory"
myBackup where:
[ : path : entries |
path noneSatisfy: [ : eachFileDirectory | eachFileDirectory pathParts includes: 'temp' ] ]
Implementation Notes
Tree Union
Banyan gets its name from its likeness to the real tree of the same name; a tree that merges itself with another tree. The MaTree class makes merging trees and then selecting branches very easy.
Core FileDirectory and ServerDirectory extensions
The implementation of all these functions rely almost solely on two core extension methods to FileDirectory:
maDirectoryTreeDo: twoArgBlock path: anOrderedCollection
"Value twoArgBlock with the path (an OrderedCollection of FileDirectory's) to each DirectoryEntry and the DirectoryEntry itself."
and
maNameFor: filename relativeTo: containingDirectory
"Assume that filename is contained somewhere in the tree of containingDirectory. Answer the semi-qualified name for filename, qualified from containingDirectory down to its location."