Glottal Fry
I learned this term (new to me) from the Perry Cook PHD Thesis I linked to in my previous post. It's speaking in a very low pitched register so that the vocal chords produce a "popping" impulse sound (think of how Noam Chomsky talks, or how many young people tend to speak lately). This is pretty easy to do, synthesis-wise, as the vocal chord waveform has to be rich in harmonics, and slowing it down (lowering the pitch) gives this kind of effect. The "pops" cause the "ringing" (Q) of the vocal tract resonances to be more noticeable.
What's harder to do is simulate the dynamics of the vocal chords. The paper discusses open/close times of the chords, open constant times, etc. and how the waveforms can be synthesized with minimal aliasing. I need to implement something here tied to volume dynamics, because what I'm doing now sounds like someone adjusting the volume control on a stereo, not a person talking louder and softer.
The paper also discusses the design of a physical model of the vocal tract, something I was burning to try but am now a bit colder on. In many ways physical modeling is what you want to do because the adjustment "knobs" on it usually closely correspond to physical parameters (string stiffness, resonant tube length, etc.) and can therefore be more intuitively tailored. And a lot of desired behavior (resonance, decay rate, spectral coloring, etc.) is a natural consequence of stimulating the model.
I'm finding three formants (band pass filters) in parallel can give a fairly realistic voice sound if properly stimulated. The human vocal tract is basically a 'Y' pipe, where the vocal chords are at the bottom, and the two upper paths are the nasal and mouth resonators. The mouth resonance is complicated by the jaw opening and closing, and the tongue and the lips, while the nasal resonance is largely constant (if not stopped by the back of the tongue). A fourth somewhat fixed band pass filter can fill-in the throat and face sounds that naturally radiate when humming and such. The mouth and nasal pipes are open on the ends, so some of the sound reflects back into the pipe with reversed phase.
My signal chain right now is a mutated sine wave for the glottal sounds. Flipping and repeatedly squaring two of the normalized quadrants gives a pretty nice even and odd harmonic content without a lot of aliasing, and doing this to all four quadrants gives an odd harmonics only rounded square wave, which unfortunately does alias at higher squarings and frequencies, though nowhere near as badly as a true square. The harmonic control is quite convenient, as I have it arranged to do two quadrant squaring for positive parameter adjustments, and four quadrant squaring for negative parameter adjustment (and a pure sine wave for parameter = 0). In parallel with this mutated sine wave I have white noise feeding a tracking or fixed state variable filter. Both the sine and noise are volume controlled via the left-hand antenna. These feed into a formant filter bank, consisting of four band pass filters in parallel. The banks are split into two screens, and I can disable either half (or both) of the filters with a parameter, and when all are disabled there is an automatic pass-through. Having two banks of two is handy, because I'm using two filters for the mouth resonance, and two filters for the nasal and throat resonance. Turning off the mouth filters sounds like humming or nose breathing.
It's quite fun to play around even with this basic setup. Sharp intake of breath causes turbulence at the lips, so simply mixing in unfiltered noise is a close approximation. Breathing is just noise through the formant filter. The two axis continuous control the Theremin provides is highly suited to this sort of thing. It's kind of weird adjusting the formants though. It's rather like summing sine waves manually to get a certain tonal color: unless you're close you don't know it, and your ear hears the separate things you are adjusting until it all suddenly pulls together. The human brain has quite strict categories for what sounds human; fortunately it isn't too difficult to fake it out with simple signals and filters.
And... yet another use for squaring! Instead of converting the formant filter frequency parameter control from linear to exponential via EXP2, I'm using one squaring. This nicely expands the low end of the adjustment and is essentially free (a single assembly instruction). So all three formant filter controls (frequency, damping, mixing) now employ squared parameters (as does the noise tracking filter when it's in manual tuning mode).