Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
Distribution of characters in Squeak code
Last updated at 11:34 am UTC on 24 June 2016
Tobias Pape (Squeak Mailing list Wed, Jun 22, 2016)

I was curious about the relative distribution of characters in Squeak Code.
I sampled the source code and drew a histogram (Attached)

 " Uses the new HistogramMorph"
 | characterFrequency |
 CurrentReadOnlySourceFiles cacheDuring: [
 	characterFrequency := ((CompiledMethod allInstances select: 
 	[:method | (method allLiterals detectSum: 
 	[:lit | lit isCollection ifFalse: [0] ifTrue: [lit size]])  1500])
 	gather: [:method | method getSource
 	reject: [:c |c isSeparator]]) asBag].
 
 (HistogramMorph on: characterFrequency)
 	labelBlock: [:c | c codePoint > 32 ifTrue:[c asString] ifFalse: [c printString]];
 	openInWorld.
 	
 ((characterFrequency sortedCounts collect: [:ea | ea value]) first: 90) join.


Result


	etarsoinl:
and more detailed, the 90 most frequent characters:
	etarsoinl:cdfumhpg.ybwSv"=1CT'x][0F)(k2ANPI|M^B4O7D6R3598#EL-,zWVjU;H+q/>G@KX${}YQZJ\~?!

	etaonishrlducmwyfgpbvkjxqz

Observations:


Comparison:

C, sampling the Linux kernel:


	et_risancodlupfm,);(0hvgb-E=x>ITRSACkNL.P1O/wD2My"{}UF&3GB4q86HV5:X#[]+zK7W9Y|%\!jQZ'


Ruby, sampling Rails:


	etsaonridl_cupmh.f:,"gb')(=y#vw/kq>ATx01R[]@S{}CE|2?-zjDMIPN+BO\F3L5!HU%&498GW6;YV7J`X




etarsoinl.png