Let's Design and Build a (mostly) Digital Theremin!

Posted: 2/18/2021 11:20:05 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

The New Adventures of Curve Pitch Correction

Roger and I have been discussing refinements to the D-Lev pitch correction, and these conversations have really helped me to better understand what it should and shouldn't be doing.  Experiments with some new ideas over the past couple of days have produced a design that I'm much happier with in terms of performance and basic functionality.  Some recent history:


At top is a high level view of the previous pitch correction.  The pitch number comes in and is separated into individual notes, which are treated equally by the note span knob (see next).  The span amplitude is controlled by the corr knob, which itself is modulated downward by the pitch axis velocity (pvel).  The result is slowed down by controlling the linear slew rate, and this is mixed with the original input to obtain the corrected pitch number as output. 

Below that is a graphical representation of the lately linearized note span.  Setting the knob to 1/4 results in the left-most graph: here the center 1/4 of the note is fully corrected, and this has the side effect (literally!) of increasing the pitch slope in the 3/4 wide transition regions between adjacent notes.  Similarly, setting the knob to 1/2 gives 1/2 note width of correction, 3/4 gives 3/4 correction, and the knob set to full gives us full correction, or steppy quantization, where the notes "pop" from one to the next.  Keep in mind that these graphs are showing you what it takes to correct the pitch, not the final corrected pitch.

A main point that gets lost in these technical descriptions: once you've simply slowed the correction down - by slew limiting, or by 1st or higher order low pass filter, etc. - you could actually just stop there as you've accomplished 90% of what we think of as "full" pitch correction, with the remaining features nice to have but not entirely essential - and they can be quite fiddly to implement (look at me, I'm years into this).  You don't even need a variable span function because even a little delay will slant the steep edges of steppy quantization, rendering them inaudible.  You do need the action of the corr knob though, as this "humanizes" the perfect correction by letting some pitch error back in.  Perfect pitch correction is audible with male voice presets, as the rich harmonics interacting with the resonance peaks give the ear a lot of relative pitch cues - remove all pitch variation and it sounds like a car horn.  Indeed, many vocal simulators introduce a small random pitch variation to combat this.

For the longest time I thought higher order filters were the way to go here as they impart polynomial-like curves to the transitions.  But, just as the shortest distance between two points is a line, linear slewing is the quickest way to transition from off-pitch to on-pitch, and the ear isn't all that sensitive to the shape of the correction.  Slowing is actually easier to implement via low pass filter because it functions ratiometrically, so the rate of change is simply set by the cutoff frequency; whereas slew factor is directly proportional to accumulator width, therefore absolute slew rate can be affected by where you place attenuation in the signal path, and in that sense linear slew rate limiting is ironically non-linear.  Who would think that exponential decay would be linear?  I digress...

Here's the design as of today:


At bottom you can see that the main topological change is that the pitch velocity is now downwardly modulating the slew rate instead of corr, and this rate can be overridden by vmod.  I should have drawn the slew|vmod mux in the top diagram as well because it isn't new, though the operation is slightly different.  The purpose of the mux is to speed up the slew at very low (generally sub-audible, the location in the volume field can set via the vloc knob) volumes to center up the next note so it comes in on-pitch.  The mux comes after any pvel modulation, so it overrides that.  I actually tried pretty hard to come up with an arrangement that didn't override pvel, but came up empty.  The pvel modulation of slew decreases the slew rate (increases the slew time) when the right hand moves, which helps to mask any steep edges on the correction signal via increased filtering.

The main problem with conventional velocity detection as applied to pitch correction is that it either has too little gain and doesn't kick in fast / often enough to lower the filter frequency - you hear stepping when span is high, slew is low, and you move your pitch hand too slowly to a neighboring note - or it has too much gain and effectively turns off pitch correction for any pitch hand movement at all.  Pvel still has a use though, because it can enable lower slew settings.  Seen above is a method I'm using to increase the general usefulness of pvel.  Graphically the pitch number is modulo multiplied by the number of notes/octave * octaves (12 * 32) in the linear pitch range, giving us 384 full scale ramps.  Looking at a single ramp, we XOR it with its full width sign bit, which gives us a half height triangle for each note.  This large scale ramping gains up the velocity detector to movement, and the cyclic nature and low harmonic content give it something significant to "chew on". Just moving the pitch hand from note to note will create signifcant change, and sweeping it through the field will generate a low frequency, high amplitude signal.  Anyway, this is fed to the velocity detector shown below the graphs.  A bandpass filter removes DC & noise, and gains down all but low frequency changes.  Then the absolute value is taken, and this is multiplied by pvel and limited to give a large saturation zone.  This is fed to a first order bi-modal low pass filter, which has a cutoff frequency of 10Hz for inputs larger than the output, and 0.25Hz otherwise, creating a kind of doubly leaky peak hold which reacts quickly.  Finally a bit inverse changes the direction, and this is used to modulate slew.

I like to include a "berserker" setting to any controls whenever possible.  Maximum span gives a very edgy and snappy quantization, minimum none; maximum corr gives 100% correction, minimum none; maximum pvel effectively turns off correction for all but the tiniest of pitch hand movements, minimum off; maximum slew gives over 5 seconds of slew time, minimum 21 ms; and vloc positions vmod over a -96dB to 0dB range.  Crazy settings tend to reveal the thing functioning and create audible artifacting.  So any real use of this module for unobtrusive pitch correction will necessarily employ the more moderate ranges of at least some of these controls.  I tend to use a lot of correction, and my settings are:

slew=16 (1/2)
pvel=16 (1/2)
span=31 (max)
corr=31 (max)
vloc=18 (vmod transition @ -42dB)

Max corr is getting a little fatiguing sounding but I can't bring myself to lower it.  Sometimes I up the slew a little for really slow songs, though now I may be able to rely on the improved pvel.  One could use even lower slew with higher pvel, but for really low slew some backing off of span is required to reduce low velocity discontinuities, and I think slew should be used as the go-to knob for the overall correcting effect, with the other knobs used as seasoning.

Posted: 2/21/2021 9:10:21 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Forget Paris Pitch Velocity

Ha!  Finally what seems like a real, solid breakthrough with pitch correction!  Using pitch velocity to slow the error slew rate is a dead end.  As I stated earlier, the main problem with it is at the note transitions: play too slowly and you get a full slew rate note step, so you either need the velocity gain turned way up - at which point it's pretty much strangling all correction at the slightest move, and then what's the point?  Or you need slower slew as a base, and then what's the point?  Well, the main point in the second scenario is that it allows you to use somewhat faster base slew, but pitch velocity is just a sort of vague helper there and I'm not sure it's worth all the effort.  It also kind of weirdly modulates, and I've improved that quite a bit I think, but it's still too erratic to rely on alone.

This morning I hit on the idea of using the position in the note as a direct slew rate modulator.  At first I tried slowing the slew rate at the note boundaries, but that was kind of difficult to set with the knobs.  Then I instead tried speeding up the slew rate towards the center of the note, which allows the "slew" knob to set the minimum or slowest slew rate, and the newly named "cntr" knob to control the "magnetic" feeling at the center of the note.  I followed this with a bimodal lowpass filter with 2Hz/10Hz rise/fall, which keeps note sweeps from falling into the "fast funnels" at the note centers, and also softens the switch to full quantization.  I don't like having more than the one time constant (the slew rate) but if they're fast enough they're not too noticeable.

I need to play with it and perhaps tinker with the arrangement, knob scaling, and funnel shape, but even now in its rough state it feels like a genuinely positive addition to the pitch correction module.  Almost no "magic numbers" in the code, which usually indicates good design.

Posted: 2/23/2021 3:33:43 AM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Slew Rate Modulation

Been puzzling over how to scale the pitch correction slew and cntr knobs, and finally hit on the following in the linear domain:


As usual the pitch is modulo multiplied to get single notes full scale.  This is XORed with the MSb of the result shifted full scale, multiplied by 2, shifted down 1/2 to make it signed by flipping the MSb, shifted right once to divide by 2, and multiplied by the cntr knob value.  The slew knob is bit inverted (not shown) to give us the rate, which reverses the sense (I briefly toyed with not inverting it, but it feels better to have quantization when slew=0).  It is shifted down 1/2 to make it signed, divided by 2, then shifted back up.  This is then added to the result of the previous cntr operations just described.  Not shown: this linear result is bi-modally low pass filtered, then brought into the exponential domain by flipping the MSb to add 1/2 full scale, adding 1/2, then passing it to the unsigned integer EXP2 library function.

This process gives us a slew knob that covers 48dB, or 8 bits of range - you need at least 48dB or so of slew adjustment to go from hard quantization to snail slow transition.  The cntr knob tilts the lead-in and lead-out, using slew as the pivot point, with the result that both knobs are able to cover a total of 98dB, or 16 bits.  The whole reason we're doing this is to produce an average slew rate regardless of the setting of the cntr knob, which pretty much eliminates any sense of interaction between them.  Though the low pass filter favors falling over rising by a 10:1 ratio, so there is a dynamic which tends to slow the average slew down.  I'm still experimenting with the settings of the low pass filter, but 1Hz & 10Hz seem to be working well.

Here's a demonstration recording I just made using slew=20, cntr=16:  [MP3]

The notes are played staccato for the first half, and the male voice being perfectly corrected sounds rather like a car horn / reed organ.  Without changing any settings the second half is played much more dynamically with a little slower tempo, and I think you'll agree that the pitch corrector is able to handle these two very disparate playing styles quite well with no adjustment.  Increasing cntr actually helps to smooth the note transitions, while accentuating the note centers.  I also set vloc=18, which snaps the pitch to the note centers when the volume is subsonic - this assists the starting of notes on-pitch.

[EDIT] I must say, I feel kinda bad for analog Theremin designers because they can't just drop into linear space, where thinking about and implementing this sort of scaling is quite simple and direct, and then return to exponential space via a non-polynomial (only 6% error max for larger inputs) EXP2, which only consumes 7 cycles. 

At the time when I was writing my integer and floating math libraries, I wondered if I'd ever recoup the copious time and effort invested.  It was an incredibly entertaining, engrossing, and enlightening activity, and just for that alone I'm quite happy I undertook the months-long project.  But it really prepared me for the unforeseen mountains of knob scaling that had to be climbed later.  It taught me a lot about bit twiddling and other assembly level efficiencies too - conventional maths alone won't net you the smallest / fastest code.

Posted: 2/23/2021 6:36:58 PM
pitts8rh

From: Minnesota USA

Joined: 11/27/2015

I played with the correction this morning and I was getting some very good results.  It sometimes feels like instead of pulling you in to the nearest note (sometimes against your will and to the wrong note) it is just broadening the correct pitches in space, making them easier to land on.  I don't even have anything optimized and there are very few artifacts, and when assisted by unquantized pitch preview as a reference I don't think I hear any (won't know until I record and play back).

I always get disturbed when features don't have enough knobs, but so far this seems to be distilled down to a pretty effective and user-friendly format. Very nice!

Posted: 2/24/2021 1:59:24 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

"It sometimes feels like instead of pulling you in to the nearest note (sometimes against your will and to the wrong note) it is just broadening the correct pitches in space, making them easier to land on."  - pitts8rh

1. Slowing down error correction at the note boundaries tilts the pitch steps flatter, but even a little slowing will do that sufficiently.  Much more importantly it increases the range of variability / expressiveness of half step portamento timing because the boundary correction isn't as dominated / driven by what once was a constant slew rate, pulling you this way and that like a lassie.  Likewise, accidentally positioning to just before / past the edge of the note you intended to land on doesn't instantly start dragging you quickly away in the wrong direction; you are instead given some time to react to and therefore correct the situation more gracefully.

2. Speeding up error correction at the note centers emphasizes them, but this creates potholes that wider range portamento (legato?) playing will fall into due to the increased relative dwell time there.  The bi-modal lowpass filter paves over these potholes as the portamento rate increases.

In this second respect it's rather like the velocity sensing approach used previously, but velocity sensing often aggravated note boundaries.  The introduction of two additional time constants isn't my preference but it seems warranted, and the velocity approach required them too, as well as other fiddly complexity (gain curve, saturation, windup / headroom, etc.) which made it ugly and a subjective tradeoff bear to work on.  I very much like the fact that this new approach is unified, and not two separate mechanisms which may at times work against each other.

I think it's even easier now to turn correction way up and not hear it too much when playing dynamically - which is probably a danger, at least from an ear fatigue standpoint.  Backing off on the "levl" knob for presets with prominent formants (strings, humans, etc.) seems to help a lot.  I don't think I'll ever adjust "span" down from its max (31) but never say never I guess, and other players may find it somehow useful for their playing styles.

You must be logged in to post a reply. Please log in or register for a new account.