Internal Scaling
One issue that seems to be resolving itself is the scaling of numbers in the Theremin software. This is something that felt fairly arbitrary in my first all hardware prototype. The difference is that the first prototype operated in the exponential domain, and this latest one operates in the linear domain.
In a conventional analog Theremin, a fixed high frequency and a variable high frequency are subtracted (via non-linear mixing and some form of low pass filtering) and the resulting pitch spacing in the mid field sounds linear to our ears, which requires exponential spacing to sound linear, so the entire process leading up to our ears is exponential. If one instead measures the variable high frequency to get a value (voltage or number), one can subtract this value from a bit larger constant, feed the result to a linear response (voltage or numerically controlled) oscillator, and achieve basically the same thing because in both cases we are subtracting frequency - the first in the time domain, and the second in the value domain. So this particular value approach (or process) is also exponential all the way to our ears. Heterodyning and other approaches which employ frequency difference directly will all give you the same qualitative hand / pitch response.
There are many reasons not to go the exponential processing route. For one thing you get basic pitch field linearity that is identical to that of an analog Theremin, which isn't terrible (it's entirely playable) but cramps up near the antenna, and the notes have a fixed rather cramped pitch (~1 octave with open / closed hand) even in the linear mid-field zone. Fixing the mid-field note cramping is just a matter of taking some root of the numbers (e.g. square root will double the inter-note distance). But fixing the near-field cramping in the exponential domain is a pain, it's a pain even coming up with an algorithm to do it.
If you instead take the frequency difference and raise it to a fractional negative power (somewhere around -0.5 to -0.25, probably depending on the antenna geometry) the resulting numbers are remarkably linear with hand distance, and so must be fed to an exponential response oscillator in order for the note spacing to sound linear to our ears. In fact, it sounds linear all the way up to the antenna, which is fantastic. And there is only one "knob" necessary to adjust this linearity, which is doubly fantastic. And the adjustment is simple and non-critical, which is triply fantastic. And offsetting and scaling the pitch, as well as measuring and displaying it on a tuner, are all that much easier in the linear numeric domain (simple add, subtract, multiply, with no powers or roots needed). It's win-win-win-win (win turtles all the way down).
What I didn't expect was for the linear / exponential "border crossing" at the input to the NCO (numerically controlled oscillator) to naturally set the basic gain of the processes leading to it, which is also pretty nice as it resolves things that were being arbitrarily set. The best analogy here is an analog synthesizer, where the oscillators and filters convert linear voltages to exponential pitches, and the input standard here is usually 1V per octave. The numeric approach has a natural gain based on the input format of the unsigned integer EXP2 function which feeds the NCO, which is a 32 bit 5.27 fixed decimal. The upper 5 bits set the octave, the lower 27 bits set the octave fraction, and full scale change on the linear side (0 to 2^32 - 1) gives full scale change on the exponential side. (The 5.27 format is used internally by the EXP2 subroutine, where the lower 27 are fed to the polynomial, and upper 5 shift the polynomial result; for a full shift over the 32 bit field the shift distance needs log2(32) = 5 bits).
So the unsigned 5.27 format is used for control throughout, for the pitch processing, for the tuner input, for the volume breakpoint processing, etc.
For processing 16 bit numbers (audio samples, volume settings, etc.), a 4.12 format seems in order (though perhaps implemented as 4.24 with the lower 16 output bits ignored). I'm currently working on subroutines that require fewer cycles at this reduced precision / resolution (fewer polynomial terms), one in particular is the sin(x*pi) needed to correct the modified Chamberlin state-variable filter tuning. Yesterday I finished a SIN2 subroutine that gives a maximum error of +/- 3 over 16 bits, which takes ~1/2 the time of the 32 bit version, and should be sufficient (spectrally pure enough) for audio generation and processing, particularly if it is attenuated at all. Sine is a slightly better choice here than cosine, as the polynomial gives somewhat less error for the same number of terms (though for 32 bit high precision it's a wash as you hit a wall of diminishing returns given the same required number of terms either way).