Let's Design and Build a (mostly) Digital Theremin!

Posted: 2/3/2018 9:21:32 PM

From: Northern NJ, USA

Joined: 2/17/2012

Pitch Correction

I've been practicing both with and without the pitch correction on.  Despite having two adjustments (strength and rate) in some instances it seems to help, in others it seems to hinder.  I'll use it for a while and like what it's doing, then come back to it later with the same piece and not like it (and vice versa).  Here is how it's structured:

The linear 32 bit unsigned pitch number goes straight through to an adder at the output.  In the example above it's ramping smoothly from note 0 to note 2.  

The pitch number also gets fed to a sub-note splitter, which returns the full width (32 bit) fractional part of multiplication by 12 * 2^5.  In the example this gives us two ramps, one for note 0 to 1, and the second for note 1 to 2.  The note fraction gets quantized (with settable strength) which distorts the note ramps towards 0 and 1 (on the y axis).  

Subtracting the note ramps from the quantized note ramps gives us a correction signal.  With no quantization the correction signal is a flat zero.  With full quantization the correction signal is a triangle wave that goes from 0 to -1/2 to 0 to +1/2 to 0 over the course of one note transition.

We low pass filter (with settable cutoff frequency) the correction signal, scale it to a one note step, and add it to the pitch number, the result of which is then exponentiated and fed to the tuner, oscillators, and filters.

I spent several hours yesterday googling (duck duck go-ing, actually) to see how others design pitch correctors, and watching videos of them in operation.  There doesn't seem to be much out there in the way of what they do with the values once they get them.  It's a huge problem to extract pitch, and another to change it, so the few papers I could find concentrated on those aspects.  This application is much, much simpler in this regard, as the pitch is absolutely known and given by a solid high resolution number, and changing the pitch is just messing with that number downstream of figuring out what to do with it.

So I'm kind of stumped on how to improve what I've currently implemented, but it seems to need improvement.  I've tried 1st, 2nd, and 4th order simple low pass filtering (the equivalent of isolated RC sections all tuned the same) - 2nd order seems to be an improvement over 1st, but 4th doesn't seem much if any different than 2nd.

The behavior I'm not so happy with can be inferred from the diagram with a bit of thought.  If you use wide vibrato then the filter integrates the correction signal over more than one note, which tends to un-bias it from the note center.

You want the thing to track larger, faster movements, and then start correcting from there, but how to do you detect this, and how do you get it to recenter when it isn't correcting so that the end note isn't off?  I think you want it to correct to the center of more than one note width, but without obvious hysteresis.  Ideally there is some kind of non-modal, linear process that can do this, but as usual the problem seems to be secrecy by those who do know.  If I find something better I'll share it - whether you want to know it or not! ;-)

Posted: 2/8/2018 4:16:51 PM

From: Northern NJ, USA

Joined: 2/17/2012

Pitch Correction II

It feels like I've made significant progress since the last post.  Here's the current pitch corrector as of this morning:

Very similar to the previous post, but plays much nicer with vibrato.  By that, I mean vibrato isn't significantly attenuated nor depth modulated depending on whether one is on the note or not, and off-note vibrato doesn't cause off-note averaging to take place, which was throwing the pitch off center in the previous design.

The first filter is a critically damped 4th order low pass "vibrato" filter, set to around 3Hz or so.  This tracks the pitch fast enough to tell us what note should be corrected for, but slow enough to attenuate vibrato.  It basically makes the note selection cleaner and less noisy.  The critical change here is the second low pass filter post note fraction extraction: it is severely slew limited (currently to 1/256 of full scale = 1 note) so that it functions more as a straight line fixed rate skew than as a traditional filter.  The result of this is quantized and used to correct the pitch.

The slew limiting can be seen as one-size baby steps in the correction process.  The filter skews one limit sized step at a time to the center of the note.  Over time (~1 second if far from the note center) the filter hits the center.  If vibrato makes it a change back and forth between two notes, then the one that dominates in terms of duty cycle will get stepped to and centered in terms of correction, though it will take longer.  It's like someone telling you to walk next door, then telling you part of the way to come back, and so on.  If they let you walk even slightly longer in one direction you'll eventually get to the destination on that end.

I've currently got a string of 4x simple first order filters with droopy Q in place of the first filter, hope to turn it into a ladder filter (ala Moog) via some feedback to tighten up the droop.  I may remove the quantization strength adjustment, and make the bandwidth of the first filter dependent on the bandwidth of the second.  The skew limiting in the second filter takes place before the corner frequency attenuation, so the limited steps are actually quite a bit smaller than 1/256 of a note.  And this filter was moved to a position before the quantization, where there is a big sharp step between notes (though I'm not sure that matters, and with no quantization strength adjustment the quantization signal will also have a sharp step).

The function of it is fairly subtle, which makes it difficult to evaluate.  Kind of like adjusting minor processing knobs on a reverb, you have to develop a major ear for what's going on in order to hear any difference at all.

[EDIT] So I implemented the 4th order critically damped pre-filter, and it doesn't seem to do much at all!  More ear training is clearly in order...

[EDIT2] New filter issue was a typo in the code that I caught with a reality check by routing the whole pitch operating point through the filter.  With that setup, on low bandwidth, it felt like the vibrato-killing Theremini!  So the filter is working now and seems to sort of help?  

I made a miscellaneous screen in the UI for trying various things out, should have done that long ago.

[EDIT3] Noticed some minor glitching between note so moved the slew limiting filter to after the quantization, as in the previous post.  And ladder filters are really weird, not all that useful for this kind of analytic filtering.

Playing with it, I'm getting pretty good results with the first filter set to ~11Hz and the second set to ~0.1Hz (if you fudge factor in the slew limiting attenuation as a frequency term).

Posted: 2/10/2018 7:02:22 PM

From: Northern NJ, USA

Joined: 2/17/2012

Volume Processing IV (4th time's the charm!)

Finally got some acceptable attack and decay assembly code working on the volume side:

Basically a first order low-pass filter with separate paths for attack and decay, and a velocity processor bolted onto the front.  The velocity processor takes the current volume number and subtracts it from the previous volume number.  Positive differences are multiplied by the attack value and added to the volume number, limited to avoid modulo roll-over, with the result input to the bi-modal LPF.  The filter accumulator is subtracted from the input; positive results are attenuated a bit to smooth things out; negative results are slew limited and attenuated by the decay value.  The selected result is accumulated (integrated).

The smoothing is here for positive velocity inputs.  Velocity differences from sample to sample at 48kHz tend to be quite small, so the attack multiplier is quite large, leading to audible raspy sounding stepping when the attack knob is set to higher values.  The slew limiting is here to make the decay linear over most of the range, which we need because this processor is on the linear side of things (with the result exponentiated and used for oscillator volume and the like).

The decay value set by the knob [0:63] is subtracted from 64 to give longer decays for higher values.  Both attack and decay values are squared to expand the lower range and to make the upper range roughly exponential.  Zeroing the attack knob removes all velocity sensing, zeroing the decay knob makes the decay time constant so short that it isn't noticeable.

Referencing the velocity to the current input value and having a single accumulator eliminates any possibility of "pumping" (where repeated attack movements of the hand cause the audible threshold position in the volume field to move around - something you really don't want).   Attack can be accentuated and given a definite physical "switching point" by lowering the volume null into the playing zone.  Lowering the smoothing to zero with a high attack level gives a "gritty" sound that could perhaps be used musically or as an effect.  Setting the decay to maximum gates the volume side open permanently to the maximum volume played.


A Cry For Help

For audio recording and analysis I use Adobe Audition 3.0.  I can't recommend this software enough, it's just incredible, and it's free!

For video recording I'm using a Logitech C920.  It works great for video (gives you manual control over everything if you want it), but all audio - even if you pick an external source rather than the internal microphones - goes through a resampling puke funnel that aliases all over the place.  I've tried every setting in the Logitech webcam software, scoured the Logitech technical fora, pawed the web, etc. and it seems there is no solution.  Which is exceedingly annoying because it forces me to record the audio separately in Audition, use my lame Cyberlink PowerDirector video editor to align the audio with the video, trim it, and export the audio only, then use Machete 4.5 to insert the audio without re-rendering the video.  What could be 5 minutes or so of capturing a quick video turns into a painful half an hour or more.  The Machete free trial ran out so I purchased it today for $20USD.  The UI is kind of sleazy, but it does what it does very quickly, and the price could be worse.  Anyone out there got a better "quick but decent musical video capture" solution that works on WinXP?

Posted: 2/10/2018 8:47:27 PM

From: Germany

Joined: 8/30/2014

I don't know about "musical".
There are still some copies of Sony Movie Studio 11 floating on ebay for sometimes < 10 USD.
It's for editing audio + video, but I have seen tutorials where people capture audio for overdubs, and capture video, but haven't seen video+audio recording, so I can't say for sure. A lot of youtube folks seemed happy with that product line overall (which now is called Vegas, and sony sold it to some other company), switching from the ultra crude Windows Movie Maker back then. V11 supposedly works on WinXP.


Posted: 2/11/2018 2:57:56 AM

From: Northern NJ, USA

Joined: 2/17/2012

Thanks!  But I was looking more for video hardware that comes with a native capture package that doesn't seriously mess with the audio.  I believe that web cams are all going to some new standard, like scanners and other stuff went with TWAIN and such.

Just did a couple more quick tests with the C920, there is literally no setting that will force a 44.1 or 48 kHz audio sample rate. Setting it to "Best - (DVD Quality) 48kHz" gives you 32 kHz.

Here's someone crabbing about it and Logitech reacts incredibly lamely, like they're surprised but have no answer: link.

Here they give it a 9.55/10 but say the audio "tends to sound a little muffled" (ya think?): link

Posted: 2/11/2018 7:38:22 PM

From: Northern NJ, USA

Joined: 2/17/2012

I made the velocity detector an 8 sample delay differentiator to give it more gain, and the velocity is now available as a bus to the rest of the code.  Here's a brief view of the volume side attack and decay in action (you need a really fast attack to do drum sounds):

I'm getting a little better!  (Though I tense up something fierce when the camera's on me...)


Posted: 2/11/2018 8:32:23 PM

From: Theremin Motherland

Joined: 11/13/2005

dewster, interesting,

can you demonstrate, like the first video,  a staccato with different amplitudes (say, in the  "crescendo/diminuendo" style)?


Posted: 2/12/2018 2:23:54 AM

From: Northern NJ, USA

Joined: 2/17/2012

"can you demonstrate, like the first video,  a staccato with different amplitudes"  - ILYA

Good question, and unfortunately no.  I'm kind of cheating as I'm using the null point to give a sharp threshold for the velocity sensing to give a sharp attack.  These things pretty much just fly into the wall (max volume) and hit it, not sure how you might constrain it, but I'll think about it.  Maybe something squared, or the difference of the difference (acceleration), but it's already so jaggy and squirrely I'm not all that hopeful.  And I'm not sure any of this is all that useful in the end, I'm just following up on the things people sometimes implement in analog Theremins to see if I can do it digitally, and if there is any merit.

The hand can only move so fast; ultimately you're up against whatever you can do with that without losing complete control over the necessary rudiments (i.e. volume control).  But if you (or anyone else) have any suggestions I'm certainly all ears.

[EDIT] Thinking about it a bit more, I suppose you could go entirely non-linear and trigger a fixed rate attack if the hand exceeds a certain velocity, and key its peak volume it to some >1 ratio of the current position.  With the current arrangement I'm not seeing a lot of value in attacks that aren't all the way into percussive territory (I find myself setting the attack knob pretty much full off or on, not in-between).  

[EDIT2] For the current setup the envelope decay rate is a constant set by a knob, and as I say I've wangled things so the null point gives an exaggerated velocity which also yields a pretty much constant attack rate, and the trigger is at a constant point in the field.  Instead of using the velocity to directly form the attack, one could detect when it exceeded a certain rate and set a timer.  When the velocity hits zero the value of the timer could be used to set the strength of the attack of a more conventional envelope generator, the result of which could be added to the current positional volume.  So: exceed the (uni-directional) velocity anywhere in the field to trigger an envelope, and the attack amplitude will be set by the total distance of the hand movement.  It would probably look like the player is hitting an imaginary drum head in space.

Posted: 2/12/2018 7:14:36 PM

From: Northern NJ, USA

Joined: 2/17/2012

Good Filtering Info

1. J. O. Smith on the Chamberlin filter: link (Smith also does physical modeling work, it's all on his web pages.)

2. John Dattoro on filters and effects: link link link link (Dattoro is a refreshing blend of mathy, practical, and passionate; you can safely ignore the derivations and jump to his conclusions.  He gives the most in-depth and useful analysis of the Chamberlin I've seen anywhere.)

3. Zolzer DAFX book: link  (Zolzer designed a variation on the Chamberlin filter.)

4: Duane K. Wise on a Chamberlin variation: link (Wise has some other papers on audio filtering behind the stupid paywalls.)

The above because I'm taking another look at the topology of the filters I've implemented.  Some papers show the BP output as delayed, others not, and I'm wondering which I should use if I'm going to implement a fader control that varies from from all input to all output (have to watch the phase when mixing them or else bad things will happen).

[EDIT] I don't think one clock phase difference makes much difference when mixing, so it likely doesn't matter where BP gets tapped off of.  In Excel sim I'm seeing some strangeness associated with mixing HP and LP due to the grosser phase differences of their outputs.  I believe the right way to do this is via all-pass fed to the mixer, where the phase difference does the actual filtering and works for you.

Posted: 2/13/2018 10:45:47 PM

From: Northern NJ, USA

Joined: 2/17/2012

Volume Processing Times Infinity (nth time's the charm)

Can't believe how much time I've spent on this one dumb thing and with almost nothing to show for it.  I just realized that my "cheating" in the video above actually makes the most sense in a way, and is the most intuitive, playable, and comfortable.  You don't want to have to flick your hand quickly (velocity threshold) to have the envelope generator kick in, you instead want to merely move your hand past fixed positional threshold in space to trigger it, and use the velocity at the crossing to set the attack amplitude.  The position will necessarily be outside the normal playing field (or inside it if you have it set for farther = louder), and you don't want velocity kicking in when you're playing normally, so I believe this is the best approach all around.  Yarg...

You must be logged in to post a reply. Please log in or register for a new account.