Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
KlattSynthesizer
Last updated at 10:19 am UTC on 24 September 2004
The Klatt synthesizer is a formant synthesizer with cascade and parallel configurations. The class KlattSynthesizer is used by the class KlattVoice.


References (from the class comment):
[1] Klatt,D.H. "Software for a cascade/parallel formant synthesizer", in the Journal of the Acoustical Society of America, pages 971-995, volume 67, number 3, March 1980.

[2] Klatt,D.H. and Klatt, L.C. "Analysis, synthesis and perception of voice quality variations among female and male talkers". In the Journal of the Acoustical Society of America, pages 820-857, volume 87, number 2. February 1990.

[3] Fant, G., Liljencrants, J., & Lin, Q. "A four-parameter model of glottal flow", Speech Transmission Laboratory Qurterly Progress Report 4/85, KTH.

[4] Alwan, A., Bangayan, P., Kreiman, J., and Long, C. "Time and Frequency Synthesis Parameters of Severely Pathological Voice Qualities."

Additional references:
Rutledge, J., Cummings, K., Lambert, D. & Clements, M. (1995), Synthesizing styled speech using the Klatt synthesizer, in `Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing', Vol. 1, Detroit, Michigan, USA, pp. 648–651.

Links

Usually, the cascade branch is used to synthesize vowel-like sounds, and the parallal branch is employed to synthesize consonantal sounds; but the parallel branch alone can be used to synthesize both vowel and consonant sounds. There are 52 time-varing parameters to control the synthesizer:







Background information on the Klatt Synthesizer see here.

An email (as a source for further documentation and example additions; to check if the things stated still apply in the most recent edition of Squeak)

Date: Sun, 04 Mar 2001 13:27:06 -0600
Subject: Voice Synthesis Question
From: "Harry E. Fassl"

I've been experimenting with synthesizing speech from text contained in a
file external to the image.

I've had to make two modifications to Speaker>>say. (See below)

The first change is the modification of the literal1000 to 110.
If I leave the code at 1000, the delay between reading the line and the
beginning of 'speaking' is unacceptably long.
My guess is that this is/was processor speed dependent and maybe needs an
instance variable, with a default at init time and a setter. (I'm running a
B&W G3 @450MHz)

The second change is to have Speaker>>say return events duration. This is
used to pause the Speaker between lines. Otherwise the next line read starts
playing before the previous line finishes.
(See the workspace code below below)

Running Squeak 3.0 image Latest update #3545, VM30Alph7MT

Questions are
Am I going to step on something else by making these changes?
(Methodolgy suggestions for researching this graciously accepted.)

Is there a better way to get the result I'm after?

Harry
H.E.Fassl
http://www.mcs.net/~hefassl

say: aString
| events stream string |
events _ CompositeEvent new.
stream _ ReadStream
on: (aString findTokens: '?' keep: '?').
[stream atEnd]
whileFalse: [string _ stream next.
stream atEnd
ifFalse: [string _ string , stream next].
events
addAll: (self eventsFromString: string)].
events playOn: self voice delayed: events duration 110. "– Modified from
1000–"
self voice flush.
^ events duration "– Returned for use with Delay to pause between lines.-"

WORKSPACE SNIPPET

voiceFile voiceText thisSpeaker thisVoice crcr timeDelay
Transcript clear.
SoundPlayer stopReverb.
crcr _ String with: Character cr with: Character cr.
voiceFile _ StandardFileStream oldFileNamed: 'Nativity.text'.
thisVoice _KlattVoice new tract: 11.7;breathiness: 0.32;shimmer: 0.1;ro:
0.7;rk:0.45;ra: 0.008.
thisSpeaker _ Speaker new voice: thisVoice.
thisSpeaker pitch: 210.0; speed: 0.613.
[voiceFile atEnd] whileFalse:
[voiceText _ ((voiceFile upTo: Character cr) copyReplaceAll: crcr with: '')
withBlanksTrimmed.
Transcript show: voiceText;cr.
timeDelay _ Delay forSeconds: (thisSpeaker say: voiceText) + 1. "Pause
between lines to avoid next line starting before this one is finished."
timeDelay wait.
].
voiceFile finalize.

Measuring performance
From: Phil Weichert
Subject: Speaker bigMan - Performance
Date: Sat, 18 Aug 2001 15:20:45 -0500

Performance enhancements ideas requested.

I have been look into the Voice synthesis stuff. In going through the
examples in the image such as Speaker bigMan say: 'You can cheat, but
don''t get caught.'
I noticed that the speech pattern is slow like a 45rpm record played at
33 1/3. I tried several other examples and they all do the same. I
would hope that a 500 mhz PC should handle this but apparently not. I
evaluated the following:
TimeProfileBrowser onBlock: [Speaker bigMan say: 'You can cheat, but
don''t get caught.']

In the leaves section, a tremendous amount of time is lots in "hash" and
process resume. I am rusty on hash. Any practical suggestions on
improving the performance of hash or any of the other parts. I would
like to heard bigMan Speaker speak at a normal tempo.


' - 58 tallies, 1031 msec.

Error: this should not happenTreeError: this should not happen
69.0% {711ms} Speaker>>say:
|60.3% {622ms} CompositeEvent(VoiceEvent)>>playOn:delayed:
| |60.3% {622ms} CompositeEvent>>playOn:at:
| | 56.9% {587ms} PhoneticEvent>>playOn:at:
| | |56.9% {587ms} KlattVoice>>playPhoneticEvent:at:
| | | 56.9% {587ms} KlattVoice>>playEvent:segments:boundary:at:
| | | 36.2% {373ms} KlattVoice>>playEvent:frames:at:
| | | |15.5% {160ms} KlattSynthesizer>>samplesFromFrames:
| | | | |8.6% {89ms} primitives
| | | | |6.9% {71ms} OrderedCollection>>do:
| | | |10.3% {106ms} KlattVoice(Voice)>>playBuffer:at:
| | | | |10.3% {106ms} QueueSound(AbstractSound)>>play
| | | | | 10.3% {106ms} SoundPlayer class>>playSound:
| | | | | 10.3% {106ms} SoundPlayer class>>resumePlaying:

| | | | | 10.3% {106ms} SoundPlayer
class>>resumePlaying:quickStart:
| | | | | 10.3% {106ms} SoundPlayer
class>>startUpWithSound:
| | | | | 10.3% {106ms} SoundPlayer
class>>startPlayerProcessBu...e:rate:stereo:sound:
| | | | | 10.3% {106ms} Process>>resume
| | | |3.4% {35ms} KlattVoice>>dBFromLinear:
| | | |3.4% {35ms} KlattVoice>>linearFromdB:
| | | | 3.4% {35ms} SmallInteger(Number)>>raisedTo:
| | | 20.7% {213ms} KlattVoice>>currentFramesCount:
| | | 20.7% {213ms} KlattSegment>>left:right:speed:pattern:
| | | 15.5% {160ms} KlattSegment>>slopeWith:selector:speed:

| | | 13.8% {142ms} Dictionary>>at:
| | | 13.8% {142ms} Dictionary>>at:ifAbsent:
| | | 12.1% {125ms}
Dictionary(Set)>>findElementOrNil:
| | | 12.1% {125ms} Dictionary>>scanFor:
| | | 8.6% {89ms}
Symbol(SequenceableCollection)>>hash
| | | 6.9% {71ms} primitives
| | 3.4% {35ms} KlattVoice>>flush
| | 3.4% {35ms} KlattVoice>>playEvent:segments:boundary:at:
| | 3.4% {35ms} KlattVoice>>currentFramesCount:
| | 3.4% {35ms} KlattSegment>>left:right:speed:pattern:
|8.6% {89ms} Speaker>>eventsFromString:
| 5.2% {54ms} Clause>>accept:
| |3.4% {35ms} F0RenderingVisitor>>clause:
| 3.4% {35ms} Speaker>>clauseFromString:
| 3.4% {35ms} Speaker>>phraseFromString:
| 3.4% {35ms} Speaker>>wordFromString:
| 3.4% {35ms} PhoneticTranscriber>>transcriptionOf:
| 3.4% {35ms} PhoneticRule>>matches:at:
31.0% {320ms} Speaker class>>bigMan
22.4% {231ms} KlattVoice class(Voice class)>>new
|22.4% {231ms} KlattVoice>>initialize
| 22.4% {231ms} KlattSegmentSet class>>arpabet
| 22.4% {231ms} KlattSegmentSet>>initializeArpabet
8.6% {89ms} Speaker class>>new
8.6% {89ms} Speaker>>initialize
6.9% {71ms} PhoneticTranscriber class>>default
6.9% {71ms} PhoneticTranscriber class>>english
6.9% {71ms} PhoneticRule class>>english

Error: this should not happenLeavesError: this should not happen
10.3% {106ms} String(SequenceableCollection)>>hash
10.3% {106ms} Process>>resume
8.6% {89ms} SmallInteger>>hashMultiply
8.6% {89ms} KlattSynthesizer>>samplesFromFrames:
6.9% {71ms} Dictionary>>scanFor:
6.9% {71ms} OrderedCollection>>do:
3.4% {35ms} KlattVoice>>dBFromLinear:
3.4% {35ms} Symbol>>=
3.4% {35ms} KlattSegmentParameter>>fixed:
3.4% {35ms} String class(Object)>>hash
'


Thanks,
Phil

Note: emacspeak may use DECTalk speech servers.