To Heterodyne, Or Not To Heterodyne
Vadim (Buggins) over on his Teensy 4.0 600MHz ARM Cortex M-7 MCU - ideal for digital MCU based theremin? thread is describing methods to best capture the pitch axis information in the highest resolution / SNR way possible with an inexpensive processor. This is a laudable goal and I'm not trying to discourage it nor disparage it in any way, though I do have some thoughts about it because I was going down a similar (though not identical) path a while back, so I have performed a fair amount of research, and even had some pretty clean FPGA code ready to go, before ultimately abandoning it for various reasons. I'm not saying I'm an expert on the subject, and it's entirely possible that I've missed some vital path that makes it more tractable and attractive, but I want to yak about it a bit.
Heterodyning is the non-linear mixing of two frequencies, producing new frequencies that are additions and subtractions of the input frequencies, and in this case (as in the case of the analog Theremin) we are interested in the subtraction frequency. The pros of heterodyning are quite clear for a purely processor based Theremin, and indeed this is why you see heterodyning implemented on the Open.Theremin - it's really the only game in town when you don't have a way to precisely produce frequencies, nor the means to precisely measure them. A D flip-flop or XOR and low pass filter is used to perform the heterodyning, with the result sent to the processor for period measurement. Earlier versions of the Open.Theremin employed the latter, and the latest version employs the former.
A general issue here is the generation of the fixed frequency to heterodyne the variable frequency with. If we don't have precise control over this frequency then we need to be able to offset the variable oscillator by tuning it, and this is the approach the Open.Theremin takes. The obvious problem with this is it requires a touchy analog adjustment on a (mostly) digital instrument which could be avoided entirely if the fixed oscillator were adequately controllable via digital generation. But it is a simple approach, and if you gotta do it, you gotta do it.
The super nice thing about using a D flip-flop as the detector is it removes the high frequency heterodyne content and gives the processor a nice square signal to measure. The problem with using a D flip-flop is the edges of the heterodyne result can only happen on the D clock rising edge, which quantizes the period being measured. The introduction of dither (phase modulation) could help to break up the quantization, but I'm not sure how one might introduce this in a processor setting. One could also perhaps use two flops here to double the resolution / halve the quantization error, but the measuring logic would have to be able to utilize this info in order for it to be actually useful.
The nice thing about using an XOR gate as a detector is you can get analog like precision timing from it. But the main snag is properly filtering out the high frequency components. You need a high order, very low Q low-pass filter to do this, and there are many problematic issues associated with this structure. You want just the difference frequency, but unless you are careful some higher frequency content will get through. If the higher frequency content is too strong you will get ripples, which will give you trouble when you go to square up the result. You want it to work over a really wide range but it really attenuates at higher frequencies, so the difference frequency will be quite small in amplitude when the difference is large, which can again lead to thresholding problems when squaring it up. The ideal situation here would be to have a tracking low-pass filter admitting only the difference frequency and killing those above, but that would be really difficult to implement in the analog domain, and it might get into troublesome modes, as you are filtering the thing that is telling you where to set the filter cutoff frequency, forming a control feedback loop.
Another general issue is that you are averaging periods, and because the periods themselves are happening at a different rate than the rate that you are sampling them at in the larger scheme of things, you essentially have a variable multi-rate process going on. I spent a lot of time looking at this and it is by no means a simple scenario. The best solution I think is to vary the averaging rate, which is best done via high order low pass filtering so we are varying the cutoff frequency, with the inverse of the length of the period (i.e. the frequency) - and indeed this is something I do on the D-Lev, though with the PLL frequency number.
Another general issue is the lag problem when measuring lower frequencies. So you are perhaps best to limit the lowest heterodyned frequency result, which implies offset heterodyning. If you carefully engineer offset heterodying with period measurement you can improve the pitch field linearity, and indeed this is actually the main reason I was pursuing heterodyning in the first place. But this inextricably ties together a lot of stuff that's much easier to deal with separately.
Another nice thing about heterodyning with period measurement is it gives you higher resolution in the far-field, and adequate resolution in the near-field. But I've found that, ultimately, the far field not that useful for playing purposes. Regardless of the method used, analog or digital, the far-field is fairly unstable and difficult to calibrate for linearity (via the null control) as the rest of the body has a lot of influence over it, so I find myself avoiding it when playing. My body probably moved a little since I did the acal, the electronics may have drifted a bit, etc. - so I can't trust the linearity out there. Even analog Thereminists will know what I'm talking about here.
So there's my brain dump from my hazy recollection of what I was looking into years ago. If I were trying to do an MCU based digital Theremin I might first try to rule in/out the D flip-flop approach, but unless I was really cheaping down I wouldn't go the analog tuning route, as that makes everything too touchy. If one is instituting some form of crystal or ceramic oscillator for the reference frequency, you might as well use the money and board space to instead install a PLL solution here (tapping off the processor crystal), with fine control via SPI or I2C. And if you're doing that, maybe a cheap FPGA instead, with the processor inside, etc. - and you're on a slippery slope with something like the D-Lev at the bottom.
[EDIT] We are all playing games in a way depending on the platform we choose to implement a digital Theremin, and Vadim is doing this more explicitly with his project, i.e. limiting himself to the resources available in a particular MCU. Ultimately, it all comes down to the precision with which we can generate, and particularly the precision with which we can measure, timing. If thermal noise is the determining factor in all of this then we have adequately precise timing or better. If we're close, then dither and averaging can help bridge the gap. If we're miles away then we have to find some other means to generate timing, and then somehow measure slower products of that (i.e. have external processes improve the precision to the point where we can throw some of it away).