Let's Design and Build a (mostly) Digital Theremin!

Posted: 7/3/2018 6:59:23 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

"... well if you watched MTV in the 90's, you know what"  - tinkeringdude

Very nice!  Both of them!  Phonemes stuck together without morphing is a really interesting effect!  Morphing of the various parameters gets it a lot closer, but not all the way out of the uncanny valley.  Realistic vocal synthesis must drive researchers crazy, with more and more effort required to yield increasingly diminished returns.  Just too many parameters with complex control functions, though I suppose even 1/2 way there is fine for weather channels and such.

I watched MTV (and VH1) in the 80's so I missed that song (link).  And I really miss having wall-to-wall music video channels, they were a great thing to have on as background at parties.  I guess YouTube more or less fills that gap now.

Posted: 7/5/2018 2:29:58 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Pitch & Volume Axis Parameter Modulation

There's a lot going on with modulation parameters that probably isn't all that apparent, even in modular analog synths.  When, where, and how to scale and apply them has been one of the tougher nuts for me (hence much whinging - sorry!).  Since I had to revisit this issue for the oscillator harmonic level input, I thought I'd post a somewhat clearer explanation of it while it's still firmly in my head. 


The above is my current approach (implemented via subroutine) to modulating several parameters via the pitch and volume axis numbers.  The pitch number associated with C0 and the volume number associated with -48dB are actually the same: 0xC000,0000 which is 3/4 of a full scale unsigned 32 bit value.  So, except for a factor of 4 weighting favoring the pitch side, both the pitch and volume processing branches are identical.

Following the upper pitch branch: the (unsigned) linear pitch number has (unsigned) C0 subtracted from it.  This is a saturating unsigned subtraction (negatives limit/clip to zero) so the result is also unsigned.  The result is at most 1/4, and it is multiplied by 4 to be full-scale.  After this the most significant bit [31] is flipped (logically negated) to give a +/-1/2 (full) range signed value.  This is then (signed) multiplied with the (signed) modulation knob parameter PMOD, yielding a +/-1/4 range signed value.  The volume branch is the same, but the result is right signed shifted by 2 to provide a division of 4, yielding a +/-1/16 range signed value.  These are combined and used as a signed offset to various things, such as the (unsigned) filter frequency shown at right.  Note that combining the pitch and volume modulation numbers by simple addition is safe to do, as the result will always be less than +/-1/2, but the result with other things downstream requires saturating math.  Even if you did this stuff with floats you would need to saturate somewhere.

Please note what isn't at all obvious (IMO) from the above diagram & explanation: the initial subtraction, *4, and NSb operations provide a signed value that "hinges" (has zero crossing) about the midpoint of operation, which is C4 and -24dB, or 0xE000,0000.  Multiplying this by another signed "hinged" value, PMOD / VMOD from the knob, creates a sort of "teeter-totter" with slope strength and direction controlled by the knob.  Since the pivot point is mid-field, it's entirely possible to, say, make the knob value more positive and have it change the current static modulation negatively - which is fairly counter-intuitive, but it seems that that's life in the big city.

The gain difference of 4 favoring the pitch side is to roughly make up for the generally more limited range the pitch field controls.  For the filters, setting PMOD to +32 gives normal chromatic spacing, setting it to +63 gives ~twice this, and negative values give negative directional modulation.  You really need one specific setting value to give exact chromatic tracking here (so ringing filters can be used as rough oscillators, so filters track the oscillators to yield harmonic content similar to fixed waveforms, etc.).

So the above is nothing new in terms of how I was doing the filter modulation, but I think the explanation of the central nuance is much clearer.  The above is however new for the oscillator harmonic level modulation.  Previously, I was only modulating the harmonic level with the volume axis, and only in a positive (louder = more harmonics) way, which wasn't entirely ideal.  If you don't constrain the axis modulation and make it "hinge" about the mid-field you get most of the change happening at the far-field, where the volume is too low to really hear it change.  

As for pitch axis modulation of harmonics, adding a bit of negative modulation seems to make human vocals more realistic sounding to me.  Which makes sense because the low pitch end vocal "fry" needs lots of harmonics, whereas on the high pitch end the vocal process is entering falsetto territory, which tends to be more mellow.  Traditional Theremin voices also tend to have more harmonics on the low pitch end due to oscillator coupling.  For violin type sounds, negative volume modulation of the harmonics might help emulate the initial "skritching" of the bow.

From many synthesis standpoints, it is extremely useful having a harmonic level input on the oscillator.

=================

It's strikes me as a bit odd that simple waveforms which merely accentuate upper harmonics (and not even in a fixed resonance way!) can sound anything like a human voice.  Though I suppose waveforms like that don't generally exist in nature.  Kind of like literally any flying insect basing its navigation on the sun and therefore being utterly confused by artificial lighting, everything our auditory system has been designed by nature to detect has been, up until very recently, fairly simple (to generate) waveforms + resonant structures.

================= 

[EDIT] Here's a bit of "Danny Boy" (MP3) with the latest harmonic modulation.  I must say that it feels a little strange controlling somewhat realistic sounding voices that aren't my own.  Here's hoping that articulating (matrix mod) the formant frequencies will add even more realism.

Posted: 7/6/2018 1:11:18 AM
oldtemecula

From: 60 Miles North of San Diego, CA

Joined: 10/1/2014


dew, I am impressed with how your playing skills are developing. To pursue a "digital" theremin sound can be good but never beautiful; it is like convincing Stephen Hawking (RIP) he can sing.

Come back to the light.

The theremin really is a simple principle; you will eventually accept this. Maybe you being blind comes from events in your childhood. 

It is the slight pulling on the waveform that will get you to the beautiful theremin sound; it is not the mixing of square waves with triangular waves passing through filters and what not, that is just more digital noise and I think you have discovered this. You have arrived at your own creative wall, I sense this. Most that have followed your work over the years also know this. Your digital research is good up to this point but you need to hybrid into an analog sound earlier than just at the speaker. I think you have thought about this.


Christopher


Posted: 7/6/2018 3:05:36 AM
gerd

From: Germany (Black Forest)

Joined: 11/25/2017

Oldtemecula, please stop this private shit storm!

Posted: 7/6/2018 3:38:30 AM
oldtemecula

From: 60 Miles North of San Diego, CA

Joined: 10/1/2014

Gerd said: "Oldtemecula, please stop this private shit storm!"

My comments are about going from a good digital sound to a beautiful sound. I would be a believer in digital if you could demonstrate a beautiful digital theremin sound. The most elementary analog or digital sound meet at the basic sine wave. If someone listens to crap long enough then everything for them sounds good.

I guess it is a matter of opinion, Thierry once demonstrated a sound in a rare theremin moment that I thought was very good. It seems to me most people today view the theremin as a childs toy or gimmick and so beautiful does not matter anymore, only lets build it cheap. If the expression of the theremin does not do something uniquely beautiful then it will die. I will be waiting to hear what you Gerd can demonstrate in sound, not playability, and please explain to us how you got there. For the theremin to sound musical is a very narrow window.

Christopher

Posted: 7/7/2018 5:11:42 AM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Dueling Modulations

Adding pitch axis modulation to the oscillator harmonic level has IMO increased vocal realism by accentuating fry and mellowing out the higher registers, but it has also introduced the possibility of no harmonics when what you actually want is a minimal level.  The volume and pitch axis numbers, when combined via various signs and gains, can easily push the harmonic level down to zero, but in reality vocal harmonics never go so low as to be a sine wave.  So today I added a "minimum harmonics" knob which seems to have fixed that.  It's weird, but the lack of all harmonics is a distinctly audible state, even if it is smoothly segued into.

I know I just said this, but having a variable harmonic level is an incredibly powerful synthesis generalization.  Stuff you might otherwise do via clipping / overload is easily done independent of amplitude (though of course you can have it track amplitude if you wish).  This new oscillator is a dream come true for me, gobs of variable harmonics and no audible aliasing.  Most (all?) musical instruments have variable harmonic content, and things like tuba sounds are pretty easy to synthesize when it's a modulation input. 

When recording multiple voices in Audition (multi-track mode), I've noticed the pitch correction pulling things too close to perfect, causing phasing issues.  So for harmonies and such (which I really want to get into) I'll probably have to dial it back or disable it.  My playing ability is getting to the point where I probably don't need it as much, though it has been a welcome set of "training wheels" for this noob, at the very least a psychological crutch, increasing my early confidence.

Posted: 7/7/2018 3:39:30 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Fry's Creak

I've gotta find some kind of process that does "creaky" voice.  I recorded me doing it, with my mouth closed, microphone next to my nose.  You can capture all kinds of noisy phenomena this way, but this snippet for me stands out:

The fundamental before the cursor is ~85Hz.  Immediately after the cursor there is a strong 1/2 harmonic at ~42Hz, and you can see the low / high periodic amplitude modulation that's producing it.  Further to the right the periodicity goes to roughly 3 or 4 cycles, with a ramp up.

I think what's going on is the springy air pressure from my lungs is so low that it's only good to get oscillation going, but poops out in a rather dramatic fashion with a peak.  It would be nice to capture or emulate this on the prototype but I'm not sure how.  Pitch and volume would definitely be inputs, as it should only occur at low values of both together.  Maybe some sort of integrate and dump?  With a variable residual it might form a rather chaotic system without the explicit addition of noise (a sort of non-linearly dumping delta sigma modulator).

Posted: 7/12/2018 2:05:29 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Headbanger's Ball

Still pondering and spreadsheeting, trying to find some way to do creaky voice and similar.  It's a strange thing because it's semi-oscillatory, yet semi stochastic, and these things are mostly at odds with each other.  At the very base I've got an oscillator which is entirely oscillatory, and am trying to shoehorn it into a stochastic setting.  It's probably sufficient to put some sort of volume envelope on the oscillator to make it sound creaky, though I believe the envelope would have to somehow be "aware" of the oscillator cycles in order to properly manipulate them, and I have no idea how to best do this.

In the process I've again looked at first and second order delta sigma modulators.  First order is perhaps not stochastic enough, second order likely too stochastic.  Maybe threshold and filter white noise somehow?   I feel a bit dead in the water, which is usually a sign that I haven't experimented enough, am looking at things wrong, or haven't read the right paper yet (who knows).  It seems I'm not very good at coming up with basic things, but feel competent enough at refining them and integrating them into my setup once I find them.

One big trap in this biz is seeing waveforms, rather than harmonics, as the goal.  The vocal researchers fall into it when they filter the resonances away and are left with a hooked hump, which they go to a lot of trouble to mathematically model, anti-alias, DC restore, etc. rather than looking for a more efficient and controllable process which produces the same or highly similar harmonic content.  The only harmonic content uniquely tied one-to-one to a specific waveform is the sine wave, which is something to keep in mind always.

Another of my "rules of thumb" with regard to harmonics generation: if the oscillator algorithm "switches modes" depending on the current phase of the waveform (i.e. where you are in the current cycle) then it's probably a bad algorithm.  Mode switching might seem fine when simulating low frequencies, but phase points are not very clear cut at higher frequencies, where only a handful of phase values end up defining the output waveform.  The hooked hump falls afoul of this rule as well.

In the end, I've found that the more general the approach to synthesis the more broadly useful the final result.  The trick is in nailing down what is general and implementing it.  It's rather like figuring out which opcodes to put in your processor.

Posted: 7/13/2018 6:38:21 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Pitch Correction (part 10?)

I'm noticing with pitch correction that sometimes the pitch sounds "too perfect" or "honky" - a bit of vibrato breaks this up, but I find myself avoiding constant tones when doing vocals, and perhaps over-using vibrato.  This got me to thinking that the linear slew function could instead be replaced by a first order low pass filter.  The correcting action would therefore be stronger (faster) when way off of the note, but softer (slower) when closer to the note.  So I did the replacement and am playing around with it.  As I've said before, unless it's cranked way up to pitch quantization, pitch correction is very subtle, and you have to become accustomed to it to get a feel of what's really going on and with how it interacts (for better / worse) with your playing style.

Thought I'd do a quick technical demo of the pitch and volume axis harmonic modulation for vocals: (MP3).  I think you'll be able to easily hear that the waveform gets brighter at higher volume (vmod+) to accentuate non-linearities; and also at lower pitch (pmod-) to accentuate fry.  The first 3 are female, done at high / mid / low octaves (via my RH playing position) and the second 3 are male, with male and female sweeps at the end.  There is a lower limit to keep sine waves from creeping in, but I'm finding that, for vocals at least, I keep the minimum knob value set to the nominal knob value - so perhaps there is an opportunity here to remove a knob?  I need to play with more extreme settings to see if there is any advantage to keeping them separate.  IMO knob/setting removal (if it can be done without crippling or lobotomizing things) is a good goal, as too many settings can confuse things and clutter up the UI if they don't add sufficient value.

[EDIT] I don't think I'll combine the minimum and nominal knobs as there seem to be "effects" type uses for different values, particularly when doing "reverse" type modulations.

Posted: 7/14/2018 7:33:29 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Pitch Correction (part 10.5)

Now that I'm using a conventional 1st order low pass filter on the pitch corrector instead of a slew limiter, and also now that I have a better "feel" for how pitch correction interacts with my playing style, I thought this might be a good point at which to experiment with higher filter orders.  A 2nd order really starts to noticeably separate the vibrato from the nominal pitch!  And a 4th order does this even more - I can dial it all the way up to pitch quantization and still introduce some vibrato, which I absolutely couldn't do with the slew limiter or 1st order filter.  I wonder if / where the sweet spot for filter order is here?

Here's a quick demo (MP3) where I have it set to a moderate amount of correction and smoothly move with vibrato over a range, then move smoothly over a range without vibrato so you can more clearly hear the chromatic stepping, then with the same settings play that patriotic ditty for the 14th (!) of July so you can maybe tell here and there what it sounds like during normal musical play.  

Large phrased or staccato pitch steps can be pretty tough to do on a conventional Theremin, but they're relatively easy on the prototype once you develop sufficient eye/hand coordination.  And the combo of pitch correction and real-time pitch display keep my pitch from wandering when playing unaccompanied (without these aids / cues I tend to float sharp over time).

[EDIT] So here's the higher level view of the pitch side:


Not much, but the basics are sometimes strangely difficult (for me at least).  Where in the processing chain do you put the "tune to another instrument that isn't in tune" knob?  I found what I've been doing got broken at some point, and having the tuning offset point located at the tuner didn't change anything in terms of audio output / displayed pitch.  Just about every setting leading up to the linearized pitch number changes it, and with no change to the pitch correction note bin locations, so the solution is to offset the post pitch corrected output.  I've got a pre and post correction option going to the tuner, and prefer (at this point anyway) to view the pre corrected pitch on the tuner.  The final exponential pitch number is only used by one component (the oscillator) so in the end I may move the exponentiation there rather than have a global function providing it.  It's all very much like analog synths, where the interconnect signals are mostly linear, with exponentiation happening inside the oscillators and filters.  Things you don't think about a whole lot unless you're the one implementing them.

You must be logged in to post a reply. Please log in or register for a new account.