Let's Design and Build a (mostly) Digital Theremin!

Posted: 12/8/2018 8:32:48 PM
pitts8rh

From: Minnesota USA

Joined: 11/27/2015

As usual I lob the ball over the net hoping to give myself some time for other things, and it comes right back into my court.  I'm good for at least a few days here.  Thanks for all of this.

Posted: 12/9/2018 10:31:14 PM
pitts8rh

From: Minnesota USA

Joined: 11/27/2015

Eric,

I promised to run some tests on the pitch correction feature of TC Helicon's Voicelive 1, and I'm including some representative pictures below.  I didn't go very far into the testing because as I saw how it was working (or not), I started to realize that it's performance was quite flawed and not really all that helpful to study.

Not including scale settings (which are really just tables of notes included or excluded from correction), there are three major adjustable parameters:  Window, Attack, and Amount. Sketches and descriptions of how these settings seemed to function (and were made without the use of any test equipment back in 2016) are located here and here.

Even with only three major parameters to play with, trying to document the behavior under a variety of combinations with different types of frequency sweeps quickly got out of hand, so I'll limit the results to a few pictures and some additional verbal descriptions.

Here again is the segment from the Voicelive manual that describes the three major settings:


EDIT 1 knob: WINDOW (cents).
When VoiceLive tries to determine which target note you are closest
to, it uses this parameter. For example, if the set of correction notes includes
“C, D, E, F, G, A, B” (C-major), and you are singing a very sharp D (80 cents
sharp), the window dictates whether you should be corrected to D, or not at all. If
the window was set to 80 or more cents, the D# would be corrected to the D
because it falls within the window. If the window was less than 80 cents, no correction
would take place. Your input pitch must fall within the window around one of
the supplied correction notes if it is to be corrected at all. This allows you to naturally
inflect your vocals and slide between notes while cleaning up the pitches as
you get fairly close to them. A setting of 100 cents or larger will cause correction to
be on continuously when using the scale C-Major, as 200 cents is the largest interval
between any two notes.
EDIT 2 knob: ATTACK.
Once the target correction pitch has been identified by VoiceLive, it begins
to shift the pitch of your vocal at a rate determined by this parameter. A setting of
99 gives the fastest setting which instantaneously pulls your vocal in-tune, an effect
that can be useful for some types of music. Settings between 16 and 40 give the
most natural results.
EDIT 3 knob: AMOUNT.
Scales the amount of automatic correction applied to the input voice. The
range is 0% to 100%. However, 0% does not mean that the correction is turned off.
The amount of applied correction depends on how far out of tune the input note is.
This allows for a very musical way of correcting pitch. It corrects the large pitch
errors while preserving the natural micro variations around the target pitch. For
example:
With the amount set to 100%, a 10 cent flat input will be corrected by 10 cents and
a 50 cent flat input will be corrected by 50 cents.
With the amount set to 80%, a 10 cent flat input will be corrected by approximately
5 cents and a 50 cent flat input will be corrected by approximately 40 cents.
With the amount set to 0%, a 10 cent flat input will not be corrected and a 50 cent
flat input will be corrected by approximately 10 cents.


The Android app Vocal Pitch Monitor was used to trace the pitch coming out of the Voicelive processor as it was being driven with an exponentially swept sine wave input from a signal generator.  Two different images were taken for most of the setting combinations - one over an octave range and another over just a few notes to give a better view of the corrected pitch anomalies.

NOTE:  The corrections on the traces below do not fall exactly on the scale notes.  Either the Android app was not calibrated or the Voicelive had a reference offset.

WINDOW parameter varied:

This image represents the hardest correction available. All input pitches are captured and corrected to the nearest scale note.

Here the WINDOW setting is narrowed so that only input pitches within a certain number of cents of scale notes are corrected.  This forms little plateaus where the pitch is corrected; if you are outside the window capture range, you can linger freely without being corrected.  My attempt to sketch out this response back in 2016 was based on how it seemed to behave when connected to a theremin.  What I failed to notice is the pitches in the "free" areas between the capture windows must match the input pitch.  I had sketched a linear but steeper-slope region between the plateaus, which makes no sense now.

I only included this image to substantiate the claim that there is a small but noticeable pitch correction with the WINDOW setting on 10.  I'm not sure that the settings are exactly as described in the manual, but even this very subtle pitch correction helps if you can come very close to the notes by ear.

AMOUNT parameter varied:

This setting affects how "correct" the corrected pitch is.  It changes the slope of the window plateaus, and a setting less than 99 lets some percentage of the input error appear at the output.  This can provide a more natural sound, and with a theremin it can allow some audible sense of where you actually are within the window.

All of the previous images have the AMOUNT value set to 99. Once the input is within the capture window, it is corrected to the exact scale pitch.  The sloping plateaus shown below show how a percentage of pitch error appears at the output.


ATTACK parameter varied:

This is a very important setting that allows free vibrato and slow transitions between notes without note grabbing.  Earlier I did not know if this was a variable delay of the onset of pitch correction or simply a slowed pitch correction;  the traces show it is the latter.  It's a little disappointing to see that this is less sophisticated than I thought it was, but then I never found the Voicelive to be a serious candidate for pitch correction of a theremin.  What a slow ATTACK parameter (Att=1 to 3) does do is allow a skilled player to transition freely to a note which will be stabilized before applying vibrato (which will not be stabilized).  Playing with very little vibrato will be somewhat easier and will blend more sweetly with accompaniment.

Playing with the Voicelive pitch correction on a theremin was an interesting experiment, but it made me understand the difficulty of implementing pitch correction without destroying the emotion of the music. I doubt that there is anything here that you haven't already implemented or thought of.

 

Posted: 12/10/2018 2:42:56 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Roger, thanks so much for performing this investigation!  That looks like a great Android app for this sort of testing!  

Your data really helps me to understand what's going on inside the Voicelive.  It appears that it actually does what the manual says i.e. it turns off the correction outside of the note correction window:

My main question regarding this is: during your playing and / or your data gathering, were these transitions into / out of the note correction windows audible?  The abrupt steepness of the slope at the transitions leads me to believe they would be audible, and if so then that would be an argument for a more linearly interpolated approach (which I'm currently doing) rather than their "turn it off between correction regions" approach.

Regarding the Attack control, you're saying this seems to be implemented as a variable cutoff frequency low pass filter of the error signal?  If so, that would explain the absence of a Velocity control, which was confusing me.  I was doing this in my previous pitch corrector, and one possible downside to that approach is this scenario: say a C is being played +20 cents off center but sounds as (has been corrected to) 0 cents.  The player quickly transitions to the playing position at the center of some other note, which is initially sounds as -20 cents due to the filter lag.  Though if the player is really precise the note transition gesture will put the hand at the +20 position of the second note, rather than the center, which will sound as 0 cents, so the sluggish filter is doing the right thing in that scenario?

I could easily add a LPF to the correction error so we both could experiment around with the various approaches.  I'm not dissatisfied with the current corrector, it does seem more transparent than the previous one, but the more transparent these things are the less they seem actually useful.  My main issue with it is that it has a multitude of knobs which fairly demand the display of internal state in order to set them.  And I feel a bit insecure with my selection of squaring the velocity value in order to do absolute value & gain & dynamic range expansion here - is squaring optimal?  Squaring seems "better" than straight absolute value, and "much better" than LOG2 of the absolute value.  I think you want the velocity to quickly and easily "peg out" when there is significant hand movement, and then slowly drop to zero when there is little or no hand movement, and some separation of these two modes via dynamic range expansion is probably desirable.

Posted: 12/10/2018 4:24:40 PM
pitts8rh

From: Minnesota USA

Joined: 11/27/2015

My main question regarding this is: during your playing and / or your data gathering, were these transitions into / out of the note correction windows audible? 

Yes they are audible, and the risetime is not terribly fast.  Even with the hardest correction settings available shown in the first plot, a 5Hz vibrato on the source gets through because of this risetime.  But a slight application of the Attack softening rounds these transitions off.

The abrupt steepness of the slope at the transitions leads me to believe they would be audible, and if so then that would be an argument for a more linearly interpolated approach (which I'm currently doing) rather than their "turn it off between correction regions" approach.

I am a firm believer that you must have at least the option to dial in "safe" regions between windows where absolutely nothing is happening.  Why this is the case is the subject for another discussion.  But do whatever you think is best for transitioning, just don't take away the option of setting zones where you are immune to any pitch correction.  I think this is very important.  A narrow Window of 10 - 30 and a setting of 1-10 for Attack provided the closest to ideal that I could get, and I was just a beginner.  I would want even less correction now.  I actually like the combination of controls and degrees of freedom that the Voicelive has - it just needs more range and some refinement.  And of course the pitch capture as an analog insert sucks, too.  With the arb I tried various input waveforms to see what it preferred, and all had problems at various times.

Regarding the Attack control, you're saying this seems to be implemented as a variable cutoff frequency low pass filter of the error signal?

That's what I'm thinking now too.

 I was doing this in my previous pitch corrector, and one possible downside to that approach is this scenario: say a C is being played +20 cents off center but sounds as (has been corrected to) 0 cents.  The player quickly transitions to the playing position at the center of some other note, which is initially sounds as -20 cents due to the filter lag.  Though if the player is really precise the note transition gesture will put the hand at the +20 position of the second note, rather than the center, which will sound as 0 cents, so the sluggish filter is doing the right thing in that scenario?

That requires more levels of thought than I can muster this morning.  Honestly, I think part of my initial misconception of how the Voicelive handled transitions may have been clouded by how I was wishing it would work.  Without thinking too hard about the implications, I am wondering if wouldn't be better to keep the WINDOW and AMOUNT controls basically the same, but redesign the ATTACK to have a simple variable delay (not a filter, a timed delay) that disables or dials-down (variable if this option is used) pitch correction for a period after a dP/dt threshold (also variable). The RC-type of risetime in the Voicelive attack creates some weird asymmetric effects with vibrato.  

 I'm not dissatisfied with the current corrector, it does seem more transparent than the previous one, but the more transparent these things are the less they seem actually useful.

Bingo on that last phrase. 

But I still think there is the hope of some very subtle type of pitch correction assistance that would be welcomed by even the most skilled players.  I love the expression available in the theremin, but I rarely hear a performance that couldn't benefit from some "stabilization" (the exception here is Katica). I certainly don't want to offend anyone, but in my opinion, to an audience that is accustomed to the accuracy of conventional musical instruments, a theremin performance really needs to be introduced with either a pre-explanation or in some cases a pre-apology.  It can be uncannily beautiful, like no other instrument, but we are working without any of the physical position feedback cues that even cellos and trombones offer to some degree.

My wish would be for the subtlest pitch correction that would blend the theremin more musically with accompaniment without detracting one bit from the player's nuances.  I think this is possible.  I want to have to transition to notes accurately by ear and apply balanced vibrato entirely on my own, but if I have to hold a note, a little assistance to help compensate for my breathing and heartbeat wobbles would be nice.


Posted: 12/10/2018 6:41:26 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

"Without thinking too hard about the implications, I am wondering if wouldn't be better to keep the WINDOW and AMOUNT controls basically the same, but redesign the ATTACK to have a simple variable delay (not a filter, a timed delay) that disables or dials-down (variable if this option is used) pitch correction for a period after a dP/dt threshold (also variable). The RC-type of risetime in the Voicelive attack creates some weird asymmetric effects with vibrato."  - pitts8rh

I hadn't thought of a dP/dt thresholding, I suppose because I've been trying to avoid any processing that turns things off too rapidly, but it's an interesting idea.  I'm using linear slewing to restore correction (with variable slew time) as that seems to work better (be less obvious) than an RC type restore here. When set to be fairly subtle, any vibrato pretty much turns off all correction.

"But I still think there is the hope of some very subtle type of pitch correction assistance that would be welcomed by even the most skilled players.  I love the expression available in the theremin, but I rarely hear a performance that couldn't benefit from some "stabilization" (the exception here is Katica). I certainly don't want to offend anyone, but in my opinion, to an audience that is accustomed to the accuracy of conventional musical instruments, a theremin performance really needs to be introduced with either a pre-explanation or in some cases a pre-apology.  It can be uncannily beautiful, like no other instrument, but we are working without any of the physical position feedback cues that even cellos and trombones offer to some degree."

Agree 100%.

"My wish would be for the subtlest pitch correction that would blend the theremin more musically with accompaniment without detracting one bit from the player's nuances.  I think this is possible."

My goal as well, and I think the current corrector is largely there (though I'm certainly not ruling future changes).

"I want to have to transition to notes accurately by ear and apply balanced vibrato entirely on my own, but if I have to hold a note, a little assistance to help compensate for my breathing and heartbeat wobbles would be nice."

Reduced pitch field sensitivity is your friend when it comes to stability (though somewhat your enemy when it comes to having to adapt ingrained playing styles).  Dialing down the sensitivity turns the Theremin into an entirely different instrument IMO, rendering it much easier to play with larger gestures and without all those micro fingering techniques.

Posted: 12/13/2018 2:34:07 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Even More Pitch Correction

Yesterday I implemented a mash-up of the VoiceLive pitch correction algorithm and my own velocity-based algorithm:


The pitch number comes in and is modulo multiplied by 384 to give the note fraction, which is a ramp going from 0 to 2^32 - 1 over each note interval.  The MSb of this is signed shifted right by 31 to give a 0 or -1 vector, and this is XORed with the note fraction to flip the section above 1/2 down.  This is thresholded with the "span" input parameter, then XORed again to flip the section back up.  The note fraction is subtracted from this to produce a signed pitch correction signal which has a dead band outside of the correction span.  Not shown is a final extended multiplication by (2^27)/12, or 0xAAAAAB, to scale it back down to note widths in the pitch number.

To detect velocity, the pitch number is differentiated, and the absolute value of this along with some dynamic range expansion is accomplished via squaring, after which this is gained-up by the "velo" input parameter.  Inverting all the bits yields a number that goes to zero when the velocity is high.

The inverted velocity is used as an envelope on the note correction signal, and a second gain reduction is introduced via the "corr" input parameter, which gives us control over the preciseness of the pitch correction within the note span.

Finally, the modulated and gain reduced pitch correction signal is fed to a 4th order critically damped low pass filter, after which it is combined with the input pitch number via addition.  The cutoff frequency of the filter is controlled by the "rate" input parameter, which is dynamically scaled via the EXP2 function.

I've only played with it for an hour or so, but my preliminary impression is that the action is less subtle and more useful than my previous pitch correctors, particularly when playing between notes.  For my previous correctors, a good pitch correction rate for playing "on the note" was usually too fast for playing "between the notes" revealing the quantization ramp when playing slowly.   The "span" control is completely linear, going from no correction to 100% quantization, which is absolutely great.  And the "nibs" that develop when the played pitch enters the dead zone between notes are easily removed by the low pass filter.  Probably because the filter is 4th order, I'm not seeing the RC "upward bowing" shown in Roger's screen caps of the VoiceLive.  BTW, that VocalPitchMonitor app is available for free on the Amazon Fire, and it's quite nice for visualizing the correction action.

I've retained the inverted velocity numeric display on the D-Lev prototype as it helps to set the velocity parameter, but may get rid of it at some point.

Thanks again to Roger for urging me to check this approach out!  I don't think I would have even thought of it on my own, and if I had I likely would have rejected it outright and not given it a fair trial.

Posted: 12/16/2018 3:54:08 AM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Volume Knee

The D-Lev volume axis has a linearized volume response with hand distance.  The knee is a linear deviation from this; it is a piece-wise linear approach to implementing an overall non-linear response.  After going to all the trouble of making a really linear volume field, why would we want to mess it up?  The answer is more in the sounds we're synthesizing than anything else.  If you analyze the difference in dBs between the quietest and loudest you would normally sing, or play an tuba, or violin, or make a noise with just about anything, you would probably finds less difference than you would expect.  It takes a certain amount of energy to get your vocal cords, or lips and a column of air, or a string to start vibrating, and any added energy above that can only make things but so much louder. So it makes sense to have the volume response ramp up from total silence to some lower but significant level of audiblity over a relatively short distance of hand travel, more gain or slope if you will, and then have a wider region after that in order to provide finer control over the dynamics.  And this is what the knee gives the player.


Above are views of the traditional "normal" or farther=louder hand/volume response on the left, and the "reverse" or nearer=louder that I currently prefer to play with on the right.  -48dB is right around the threshold of audiblity, so that is what I chose as the minimum for the vertical axis.  The horizontal axis is hand position from the antenna.  The larger hand response region is just the unaltered linear axis response, the shorter, gained up "kneed" section is a line segment deviation from this, with the transition point and steepness set via user knobs.

Without some sort of knee, the majority of voices would be more difficult to play.  And since the knee region has higher dV/dt, it makes a convenient fixed point in space to detect velocity and trigger the envelope generator.  Finally, the body is often rather close to the volume antenna, and the downward expansion of the knee response helps minimize the interaction here - it gives you a really solid "off" volume floor to work with.

Would some kind of curve fitting be more ideal than this piece-wise approach?  I don't believe so.  I think having good linearity within the two regions makes them individually more playable.  And the sharpness of the knee transition isn't evident (it doesn't "pop" or anything like that) unless the knee slope is jacked sky high, but you would only be doing that for effect, or for velocity / envelope generation purposes.

Playing with a knee means you don't have to move your hand unnaturally quickly over large distances in order to shape the sound.  A large linear volume field is great if there is the possibility of introducing a knee, and not so great if that isn't the case.  I haven't studied them much, but I wouldn't be surprised if many purely analog Theremins have something roughly similar to a knee in terms of non-linear volume response.  For them, if done well, it can likely be a good thing, for the reasons stated above.  

IMO you really don't want to play a perfectly linear volume field in the raw.  You don't want to have to move your hand say 8" to go from true inaudiblity (-96dB) to the edge of audibilty (-48dB), then another 8" to go to full volume.  Your hand / arm will get tired of fully damping the sound, and your body will be close enough that full damping probably isn't even possible, so you'll have this low wheedling sound between notes and during other (what should be) silent passages.  Your volume hand will be flailing around while you're trying to make precision micro-movements with your pitch hand - not exactly a marriage made in heaven.

Posted: 12/18/2018 4:00:43 AM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

My So-Called Playing Technique

It's not every day you get to see a reduced sensitivity digital Theremin played by a hack!  Though I must say that playing the D-Lev prototype has made me much more aware of the way others play, which I suppose is true of learning any instrument even shallowly. 

As you know, the sensitivity of both axes on the prototype is completely adjustable and independent of the noise making logic, so I've got the pitch field sensitivity set to around 1/3 the typical sensitivity of an analog Theremin, and this allows me to use arm and hand movements rather than hand and finger movements.  It also makes standing at it and doing unnecessary things like breathing pretty easy.  But with any fundamental deviation from the traditional, there exist drawbacks for the already experienced player.  

Regardless of the given Theremin and playing technique, there is a tension between vibrato movement and note selection.  It may look like I'm only wiggling my fingers, but I'm using my fingers/hand/forearm mass as a torsional pendulum of sorts, which has a natural frequency somewhat above what I would like for vibrato, hence my "20's vocal trilling" tendency.  Though I do sometimes hinge my 3 fingers and/or wrist a bit to play intervals.

On the volume side there is a tension between inter-note definition performed with a quick dip towards the "normal" antenna, and an attack type envelope which is performed with a quick dip towards the "reversed" antenna.  I would very much like the former, but can't let go of the latter.  That and I've never gotten over the "normal" volume sense feeling utterly alien to me.

I wonder what it is about the thumb and first finger touching that is so comfortable and natural-seeming re. Theremin playing?

Anyway, another look my messy bench for a seasonal ditty that's pretty easy to play: (If the audio quality seems particularly bad, it's because I let the web cam software capture it and mangle it to a 32kHz sampling rate, which is its wont.)

Between the relaxed posture and larger movements that come with reduced sensitivity, the LED tuner feedback, and the pitch correction, it often feels like I'm just painting by numbers.

Happy Holidays everybody!

Posted: 12/18/2018 10:16:07 PM
tinkeringdude

From: Germany

Joined: 8/30/2014

Ha! Your lame playing sounds like you're becoming quite decent. I'm *still* at "stomped cat's tail" stage. Ok, then I don't practise, as I haven't got a permanent solution where to actually put my Etherwave clone (with enough free field around). Would like to try some day with a more well behaved device such as yours.

Now all you need is a little reverb. How about this? It could double as a volume "antenna"!
(ok ok, maybe a bit on the bulkier side )
(time link apparently doesn work here: listen to &t=13m22s)


Posted: 12/19/2018 10:49:15 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

"Ha! Your lame playing sounds like you're becoming quite decent. I'm *still* at "stomped cat's tail" stage. Ok, then I don't practise, as I haven't got a permanent solution where to actually put my Etherwave clone (with enough free field around)."  - tinkeringdude

Thanks!  A lot of practice is having it set up and convenient to play, and putting in a few minutes here and there.  That's one reason I like instruments that either don't have to boot (guitar, etc.) or boot very quickly (the D-Lev boots in under 2 seconds).

"Would like to try some day with a more well behaved device such as yours."

If you're ever in NJ, please look me up!  I hope to sell them at some point too.

"Now all you need is a little reverb. How about this? It could double as a volume "antenna"!"

Absolutely fascinating!  Thanks for that!

You must be logged in to post a reply. Please log in or register for a new account.