MessageTally
Last updated at 2:10 pm UTC on 20 March 2007
(culled from https://lists.wisc.edu/read/messages?id=1411189 written by Andreas Raab)
MessageTally
The primary tool to measure performance, both for Squeak in general as
well as for Croquet, is MessageTally. MessageTally acts on a particular
expression (MessageTally spyOn:["your expression here"]) and provides
the following information:
- A hierarchy showing how much time was spent where in the computation.
- A list showing which amount of time was spent in which leaf nodes
- Memory statistics, incl. the growth rate, garbage collection info etc.
MessageTally uses a technique known as pc-sampling. What that means is that a high-priority process is started which (based on a timer) samples the call stack of the process and allocates a time value (typically whatever it's using for sampling).
It is important to notice that this is a statistical measure - given a large enough number of samples, the reported result will be statistically valid. On the other hand, small numbers of samples are typically statistically invalid - a single garbage collection can lead to a major change in an otherwise insignificant part of the computation.
Here is an example: MessageTally spyOn:[100 raisedTo: 1000].
resulted in the following output:
**Tree**
100.0% {3ms} SmallInteger(Number)>>raisedTo:
100.0% {3ms} SmallInteger(Number)>>raisedToInteger:
50.0% {2ms} LargePositiveInteger>>*
50.0% {2ms} primitives
These results claim that we're spending 50% of the overall time in Multiplying large integers. Which is completely bogus since we have only two samples (the default sampling rate is 1ms). If we run this for a longer period of time, like here:
MessageTally spyOn:[1000 timesRepeat:[100 raisedTo: 1000]]
**Tree**
100.0% {1007ms} SmallInteger(Number)>>raisedTo:
99.9% {1006ms} SmallInteger(Number)>>raisedToInteger:
96.8% {975ms} primitives
3.0% {30ms} LargePositiveInteger>>*
We see that indeed, we only spend roughly 3% in multiplying large integers. The other 97% are spent in primitives which, unfortunately, are not broken out separately in the measures (however, if such a measure is critical, then the primitives can to be factored into separate methods which then call the primitives themselves - this allows message tally to "see" the method frames and report the usage accordingly).
In a more complex situation, the percentage tree is typically useful to figure out roughly the areas in which time is spent which can then be measured individually.
Leaves
The Leaves reported by MessageTally is the amount of time spent in a method without the time spent in the methods called from that method. In our above example the leaves are reported as:
**Leaves**
96.8% {975ms} SmallInteger(Number)>>raisedToInteger:
which is the overall time spent in Number>>raisedToInteger: (1006ms) minus the time spent in LargePositiveInteger>>* (30ms). If a method shows up in the leaves it typically means that this method is computationally expensive or just gets called a large number of times.
Memory
The Memory statistics shown in MessageTally provide information about how various memory regions have changed:
- old: Describes the "old space" in memory. This is the portion that will not be included in incremental garbage collection but only during a full garbage collection. See also the "tenure" information below. Extensively growing old space typically means that there is a problem with the allocation patterns or garbage collector settings.
- young: Describes the "young space" in memory, e.g., the region handled by the incremental garbage collector. Changes in young space are usually not relevant.
- used: Total amount of used memory.
- free: Total amount of unused memory.
GCs
The GCs statistics provide information about the garbage collector:
- full: The number of "full" garbage collections and time spent in those. Automatic full garbage collections should be VERY rare, they are a sign that you're allocating huge amounts of memory repeatedly. Note that at times these garbage collections are manually triggered though (like in the checkpointing process) and a normal effect of the operation.
- incr: The number of "incremental" garbage collections and time spent in those. Generally speaking, IGCs should be quick (avg. < 2ms) and frequent (several times a second). However, the total time spent in IGCs should generally be less than 10%, otherwise this is a sign of a problem with the allocation patterns. If the time spent in IGCs exceeds 25% something is definitely wrong.
- tenures: Tenuring occurs when the number of surviving objects in young space exceeds a certain threshold. In this case, the young space boundary is increased (which adds to the size of "old space" mentioned above). Tenuring typically means that the working set of the application hasn't been reached. For example, in a space construction we would expect frequent tenuring until the space is fully constructed. Once the working set has been reached, tenuring should be rare to non-existent. Frequent tenuring in such cases means that the garbage collection parameters need to be adjusted.
- root table overflows: Root table overflows describe the (rare) case that the number of roots for the incremental garbage collector will overflow the internal table. This will force an immediate garbage collection plus tenuring. The measure is provided in order to be able to find such rare cases (which otherwise leave you wondering why the system is running full GCs all the time for no apparent reason)
Multiple processes
Historically, MessageTally measured and reported only the call stack of the current process. This had the disadvantage that if time was spent in a different process, it would be attributed to a bogus frame in current thread. For Croquet, I have changed this such that all processes are reported in order to be able to see "what else" is going on.
For example, if we measure an expression like here:
MessageTally spyOn:[(Delay forSeconds: 5) wait]
we will find that all of the time is reported here:
**Tree**
99.5% {4975ms} ProcessorScheduler class>>startUp
99.5% {4975ms} ProcessorScheduler class>>idleProcess
The idle process is the process that is being activated when no other activity occurs (the implementation of the idle process requests the VM to sleep for a millisecond so that the VM isn't running a busy). Generally, time reported in idleProcess is time spent "doing nothing" (e.g., waiting for some activity).
The other relevant system process that may show up is the finalization process. If the finalization process shows up, it means we're having a problem with too many weak references being finalized. This has been a big problem in the past, so keep an eye on it.
stephane ducasse wrote:
> may be you want to send the changes you did to get other processes
> analyzed. We could merge that in Squeak
In MessageTally>>spyEvery:on: where it says:
Timer :=
[[true] whileTrue:
[startTime := Time millisecondClockValue.
myDelay wait.
self tally: ObservedProcess
"tally can be > 1 if ran a long primitive"
by: (Time millisecondClockValue - startTime) // millisecs].
Simply replace:
self tally: ObservedProcess
with
self tally: Processor preemptedProcess suspendedContext
Cheers,
- Andreas