Let's Design and Build a (mostly) Digital Theremin!

Posted: 1/26/2018 8:54:23 PM

From: Northern NJ, USA

Joined: 2/17/2012


No presets implemented yet, but I'm thinking about it.  

Memory limitations - I'm working with a 128KB EEPROM, half of which I am reserving for software loads.  The main memory is only 16KB so 4 loads, or a single load 4x larger, could conceivably fit in that half.  This leaves 64KB for presets and the like.  What makes up a preset?  A preset is basically a group of parameters that get written from EEPROM to various RAM locations when loading a preset, and vice versa when storing a preset.  

Parameter sizing - What is a parameter?  A bunch of things actually.  At the lowest level it's the current "position" of the associated encoder.  There may be manipulation of it in order to use it in the code (filter frequency, audio mix, etc.).  Finally, it may require yet more manipulation in order to display it in a convenient human readable form (decimal Hz for example) which includes a text label prefix.  (It's really too bad we somehow picked base 10 for our numbering system.  Base 4 makes much more sense, and would have relieved us all from the drudgery of memorizing times tables, long division, etc.)

For efficiency of storage, what's the smallest practical number the lowest level parameter can be?  I'm thinking unsigned byte here.  Even though the encoders employ code that senses velocity, spinning a knob from 0 to 255 is perhaps about as much as I'd want to do for any single setting adjustment.  Thinking about specific applications, and given an upper frequency limit of ~8kHz and a lower of maybe 8Hz gives a span of 10 octaves, dividing this by 255 means we can specify 1/25th of an octave, or ~1/2 note, which seems fine enough resolution for most things.  I'm thinking unsigned because it's easy to check over/underflow, and most parameters values are used unsigned by their code.

Parameter forms - What converts the lowest level parameters to a useful form by the code?  Currently I'm doing this in the code itself, though that's kind of clunky and slow for code in the DSP critical path.  I have a separate thread which deals with the encoders and updates the associated parameters, but it could also be handling this conversion.  Finally, what converts the parameters to display values and strings?  I'm also doing this on a separate thread.

I'm currently grouping the lowest level parameter, its upper and lower limits (all 16 bit values), and its string label in a single blob.  These blobs are grouped together for a single screen, but otherwise are located all over the code.  This requires a striding function to return a pointer to the base of the indexed blob, but once the pointers get one or two deep I can barely keep straight what's going on.

Normalization - I'm thinking of defining specific types of parameters (yes/no flag, unsigned limited to 1 thru 15 forms, unsigned limited to (2^n)-1, etc.) and then having a byte "type" dictating how they are to be interpreted to both real values in the code (stored as a 32 bit) and for display (stored as a 32 bit pointer to a string).  The type byte, real value, and string pointer are fixed in the code, with only the low level parameter byte needing to be stored in preset memory.

There are only 8 screens so far, with max 7 parameters, so 56 bytes tops per preset.  We could have a thousand presets, which is way more than I anticipate implementing.

System parameters - There are parameters that shouldn't be updated with the preset system, and these are things like volume and pitch linearization, sensitivity, and offset; mains frequency (for the hum CIC filter); etc.  I'm still thinking about how to handle these, perhaps only storing them when performing an auto-calibration of the axes or similar, and only recalling them at power-up.

Auto parameters - Should the instrument track all user activity and store this in EEPROM in real time, so that any playing around is remembered through power cycles?  I'm not sure.  It seems like a handy thing, but users often rely on power cycles to clear situations that are causing them trouble.

Default preset - I'm thinking preset 0 should always load at power-up.

Factory vs. user presets - Often there are factory preset regions and user preset regions.  Sometimes you can write over the factory presets.  And there is almost always a way to reset things back to 100% factory settings.

External editors and preset storage - Lots of instruments have PC or tablet software you can run that accesses the parameter system, which can provide easier editing and preset storage / updating / trading with others, etc.  I don't really like external editors.  Sure they're convenient, but they invite laziness with the development of the native instrument UI , and they (and their electrical interfaces) tend to go out of style pretty quickly (try to run and connect even a 10 year old editor on anything to anything).

Posted: 1/27/2018 12:20:52 AM

From: Northern NJ, USA

Joined: 2/17/2012

Binary to BCD

I was using an algorithm I found on the web (link), where the BCD nibbles form a contiguous binary shift register:

  {thous[3:0], huns[3:0], tens[3:0], ones[3:0]}

which can also contain the binary input concatenated to the least significant end:

  {thous[3:0], huns[3:0], tens[3:0], ones[3:0], bin[15:0]}  

0. Load the input value into bin[15:0] and zero out the BCD nibbles.

1. Add 3 to each BCD nibble that is greater than 4.

2. Shift the whole shebang left one bit.

3. Goto step 1 and do as many times total as there are input bits.

Conveniently, the adds in step 1 will not cause carry out into the upper nibbles, so there is no need to isolate things in any way while performing the adds.

It works great, but it takes 162 cycles max to execute!  (Though you can speed it up considerably for smaller input values by short-circuiting leading zeros.)

When working on the tuner a while back I realized that, if you have a 32 bit unsigned value that you are treating as a fraction (decimal to the left of the MSb) then if you multiply it by 12 the extended result (the bits above the MSb) gives you a modulo 12 value, and the normal result gives the modulo fraction remainder.  I just tried applying this to a 16 bit integer, with repeated multiplications by 10 to the fraction for BCD conversion, and it works if you initially multiply the input by (2^32)/(10^n) where n is the number of BCD digits desired.  And it only takes 16 cycles to kick out 4 BCD digits!  A 10x savings of real-time, incredible!

It's this kind of crap that kills you when doing real-time stuff, conversion to crazy human base 10 and displaying it, so it's nice when any kind of shortcut come along.

Posted: 1/27/2018 1:03:44 PM

From: Theremin Motherland

Joined: 11/13/2005

"No presets implemented yet, but I'm thinking about it."


You should keep in mind a compatibility problem every time you improve your code. Format of preset data is reviewed many times until you reach the release version...

There is such a helpful thing, as
transfering of settings from one instrument to another...

Yet there is a problem for the end user on parameter compatibility after firmware upgrading (sure they want to use their previous settings)...

and tonns of other

All this routine job (maybe for years!) kills all the enthusiasm (mine was killed several years ago). Hope you arnt a pessimist.




Posted: 1/27/2018 4:34:12 PM

From: Northern NJ, USA

Joined: 2/17/2012

ILYA, I agree, UI's are quite a lot of grunt work, and with not a lot to obviously show for it.  I've got three iterations into the CLI (command line interface) that no user will ever really see, and am on the third spin of the menu system which the user will definitely see, so it's a bit more gratifying.  But I've spent much more time on the CLI and UI than I have on the audio DSP side of things.  It's mountains after mountains.

As you bring up, synchronization of the presets across loads can obviously be a huge can of worms, worse than keeping the separate boot and software loads playing nice.  I'm heartened to some degree by the simplicity of the Theremin.  Mid Moog creations seem to be more complex (with Bob farming out the digital side?) but the EWPro has only one preset for the user.  And after implementing a pretty decent human voice (and all the fun that has been to do and fool around with), I think most players really just want something that suggests a voice or violin family instrument in certain registers.  Plus a sine wave to do all that Tannerin 50's sci-fi stuff.  There's a certain level of recognizability, utility, and charm to be found in the "standard" Theremin voices.

Posted: 1/27/2018 11:08:14 PM

From: Northern NJ, USA

Joined: 2/17/2012

Day of the Short Algorithms

Nice thing about a project this big (where I used to work something like this would have had several engineers on it for an extended period) is you can always find things to do if you are studiously avoiding some other aspect, or in this case waiting for your brain to make up its mind regarding UI preset details.

Squaring and multiple squaring is working great for a lot of parameters, but I think it's a bit too "steppy" for the sine wave glottal mutation function.  I'd like to have a more continuous input here, so harmonic content could be smoothly modulated with the pitch and/or volume operating points (to do vocal fry and Theremin coupling sounds).  Variable powers and roots are the bread and butter of LOG2 and EXP2, where you take the log, multiply, then exponentiate the result.  So I thought I'd look into "cheaper" rough-and-ready versions of these functions that use less real-time than the 32 bit float or integer forms.  

I already had a low quality version of EXP2_UINT, but it wasn't quite good to 16 bits (arbitrarily chosen point) with no error.  So I added a term to the polynomial (4 total) and now the error is +/-0.0005% for 18 cycles.  I then implemented a low quality version of LOG2 using a 5 term polynomial which yields similar error of +/-0.0008% and takes 23 cycles max.  Looking at the polynomials themselves, they aren't all that far away from a straight line going from 0 to 1 in both axes, so I also implemented VLQ (very low quality) versions of both of these functions where the polynomials are totally MIA, giving +6% error and 7 cycles for EXP2, and -0.3% errror and 9 cycles max for LOG2.

So if you want an EXP2 float, UINT32, UINT16, or VLQ, you have to spend 35, 27, 18, or 7 cycles, respectively.

And if you want a LOG2 float, UINT32, UINT16, or VLQ, you have to spend 48, 41, 23, or 9 cycles, respectively.

It might make sense to implement two term polynomial versions as there is a huge jump in %error and cycles from VLQ to UINT16.  I suppose I'll wait and see if anything goes obviously awry when the VLQs are pressed into service.

[EDIT] Ha, couldn't help myself and had to do second order polynomial versions of EXP2 and LOG2, which consume 12 and 14 cycles, respectively.  So that price / performance hole is filled.

Also made low quality versions of square root, which is a huge honkering function.  The float version is 42 cycles, and the UINT32 version is 43 cycles.  I made a 16 bit accurate version that takes 25 cycles, and a non-polynomial version that consumes 14 cycles.  It's strange but the input interval [0.5:1) which gives [0.707:1) is linear to a pretty flat -1.5%, and all inputs are normalized to that interval.  So for rough-and-ready stuff you can use a normalized line segment and the error will never be worse than that.  There is no error at the ends where the accuracy is often much more important.

It might seem that I'm navel gazing, but having this kind of functionality ready to go can really keep the coding momentum going when you find you need it, and there it is just lying there in the library, waiting for you to use it.  It's gotten to the point where it's actually fun!  My bit-twiddling fu was born in FPGA logic design, but has grown much more muscular with assembly work.  The very lowest levels are where all the real speed increases and memory/fabric reductions are to be found.  It's really too bad more developers don't get the opportunity to start down here, standing on the shoulders of giants is sometimes kind of a drag.

Posted: 1/30/2018 11:42:02 PM

From: Northern NJ, USA

Joined: 2/17/2012

A very nice Master's Thesis on building a basic synth with an older DSP: Design of a Scalable Polyphony-MIDI Synthesizer for a Low Cost DSP.

The inexpensive low-aliasing sawtooth generation technique is something I need to try.

Posted: 1/31/2018 4:13:12 PM

From: Northern NJ, USA

Joined: 2/17/2012


Just did a quick web search, and I'm almost certainly missing something, but I don't see my binary to BCD algorithm anywhere, which is kind of weird.  Lots of hits for the subject, lots of complicated algorithms, literally everyone needs to do this at some point, often in hardware, why can't I find this extremely simple and blindingly fast method elsewhere?  For this reason I'm posting the algorithm here in HAL:

// UINT16_TO_BCD4  - (0:uint16, 7:rtn | 0:bcd) 
// return BCD of 16 bit unsigned value, [0:9999] range
// no input bounds error checking
// 16 cycles max/min 
@uint16_to_bcd4 { s1 := 0x68db9  // ** UINT16_TO_BCD4 SUB START **  (2^32)/(10^4)
                  P1 *= P0       // scale bin
                  s0 := s1 *u 10 // bcd := bin *u 10
                  P1 *= 10       // bin * 10
                  s1 *u= 10      // bin *u 10
                  P0 <<= 4       // bcd << 4
                  P0 |= P1       // comb bcd
                  P1 *= 10       // bin * 10
                  s1 *u= 10      // bin *u 10
                  P0 <<= 4       // bcd << 4
                  P0 |= P1       // comb bcd
                  P1 *= 10       // bin * 10
                  P1 *u= 10      // bin *u 10
                  P0 <<= 4       // bcd << 4
                  P0 |= P1       // comb bcd
                  pc := P7 }     // RETURN =>  ** SUB END **

In plain English:

1. Scale the input by (2^32)/(10^4), where the '32' is the binary modulo and '4' is the number of BCD digits you want.  Call this the residue.
2. Multiply the residue by 10.  The upper 32 bits are the new BCD digit, the lower 32 bits are the new residue.

The HAL code does step 2 four times (unrolling the loop) to get 4 BCD digits, and combines the BCD digits via left shifts and ORing.  That's it!  I suppose it's fairly counter-intuitive to do multiplication and get something like modulo division out of it.  A full 32 x 32 = 64 bit multiplier (or in the case of Hive: 33 x 33 = 64 to accommodate signed/unsigned) is an extremely powerful thing, and it usually pays to investigate specific ways to use it.


Read a nice article on the programming culture of complexity this morning (link).  My brain is probably deficient, but the syntax they picked for pointers in C completely stymies me all the time.  The * and & can confusingly be stuck to either the type or the variable, and for some reason the roles seem reversed in my head.  Why @ wasn't used is beyond me as it seems like the most a natural symbol for addresses.

And the C++ implementation of objects seems entirely overblown.  I like objects, and can totally see a real crying need for making things that are an amalgam of private data store and functionality, even at the assembly level.  But I literally never use inheritance, and public / protected / private section designators don't include an external read-only mechanism, so I end up writing a ton of trivial functions just to read otherwise private data (and spending an inordinate amount of time naming these functions).  I'm sure Bjarne Stroustrup is way smarter than me, but IMO his C++ fetishises objects and gives short shrift to practicality.  

*.h file prototyping is more completely inane busy work pushed off on the programmer, as is the glaring lack of a packaging system in 2018.

Posted: 2/1/2018 1:23:40 PM

From: Germany

Joined: 8/30/2014

Yeah, there are, or so I claim, some inconsistencies in that C syntax. For one thing, you can declare a bunch of variables of the same base type, say int, in a list, and then decide individually which one gets to be a pointer, like: int a, *b, c;
Which is why some people always put the * to the variable name. I can't blame them. But I don't like it. Ther is no magical extra property for a variable saying "it's a pointer", it clearly belongs to the type: pointer to int.
I don't use such declarations and always put the * to the type, if not using a typedef for some things which makes it even more clear. Not sure what they were thinking. The '&' probably is strange, but I'm so used to it that it doesn't feel strange. Although I'm not *that* old (mid 30's with lower formal education than I should have at this point), I did start out on a C64 and still think "$FF" looks better than "0xFF", but I get why the rest of the world disagrees :D

Inheritance is probably sometimes useful and "natural", but people got a bit crazy with that and it comes with all sorts of problems and has kinda fallen out of favor; at least as a tendency, the wisdom of the year seems to be "prefer composition over inheritance", i.e. "has a" over "is a" relationships. That often does make sense to me. There have been texts written about inheritance saying, you should rid yourself of the boundaries of the natural seeming examples like "dog IsA mammal, FiatPanda IsA car*" and go far beyond and get all abstract with those things. It looks to me that sticking to using inheritance mostly in places where it does seem to naturally map on things may not have been so wrong after all, after a lot of people agree that designs have gotten really messy and insanely complex :-D
(* not everyone agrees)
If you want to use interfaces in C++, you kinda have to use "inheritance". I like interfaces. Sometimes.
Perhaps using virtual functions isn't the best thing in some inner loop in an embedded project. One just has to keep that in mind I guess.
I like templates. Also for embedded. I made some array types with bounds checking in debug build, which are drop-in replacements for "naked" arrays and do everything else like native arrays (non dynamic). That *did* help me see some bugs early, and I don't have to reimplement basics such as (partial) copying between arrays, with offsets, etc, correctly, 100 times over. (I don't use the clunky, malloc happy STL containers in MCU projects)
Some find operator overloading crazy. You can certainly abuse it dangerously. But after implementing a template vector & matrix (basic linear algebra) library, and being able to (almost) just write calculations like on paper, I wouldn't want to go back.

If you write lots of functions to read private data, like all those Java classes flooded with getters (and setters). It kinda feels like defeating the purpose to me, although that's what they teach you in intro Java courses. Perhaps the separation of concerns is not optimally worked out, if you feel the need to pick quasi-private fields from some entities often. It's a hunch, I won't pose as a software engineer or "architect", I'm not one, but I developed some software and have some ideas about some things ;)
I guess sometimes it's a quicker route to make something work more time-optimal (cycles), which easily trumps other things in MCU projects.

*.h file prototyping is more completely inane busy work pushed off on the programmer, as is the glaring lack of a packaging system in 2018.

Ha! Hear, hear. I have been saying things like that for a long time. That's why I can't be bothered to use C++ for any "desktop software" project (well, I can be "forced", under protest). I prefer C# (not the oh so similar Java which is pain and facepalm inducing to work with, they really done f'ed up early on in design) and if I some day feel like it, will look more at F# or other functional languages. May seem like a fad, but I do like the "sorta functional lite" aspects that C# has been given.
They keep bolting "modern" stuff to C++, making an already insanely complex language even more so, and to stay relevant, at some point, I'll have to make myself familar with all of it, although for my private embedded stuff I only have been using some slim subset of it, when I felt bending C to emulate some OO-esque features was becoming silly. (and I really hate all the #define magick madness needed to get anything done in C. And why did they think broken constants are cool?).
So C++ gets ever heavier, but it's still ancient to the core. Text replacement and text pasting (e.g. #include) is quite a crude mechanism, which always seemed to me does not exist because it has virtue, but because back in the day, tech was just too primitive / computers to weak to not push this stuff, which to me looks like it should be compiler internae and are not part of a language, onto the developer - like the damn computer is outsourcing grunt work to the user. Insolence! Patching stuff like circular #include dependencies and finding errors due to misspelt include guards (well I use #pragma once, which "all" compilers support), hahaha, good one...
In some future C++ standard, packages were supposed to be a thing, but I think it was pushed further away or kicked, I don't remember. I wonder how they would keep backwards compatibility, and also make C++ less of a mess in at least that regard.

Btw, interesting link, that low-end DSP article. I never worked with an actual DSP. Since I got to play with ARM Cortex M4 with HW float (STM32F7 even implements double precision, IIRC) and some dsp-ish instructions, the idea of trying one has fallen further down on the toiletpaper long TODO list of mine.
What would you say are the benefits of a "real DSP" today, and where? (although the interest is mostly hobby, price at moderate quantities is not totally unintersting. Who knows. Maybe I'll sell something some day, howewer unlikely ;))

Posted: 2/1/2018 3:41:36 PM

From: Northern NJ, USA

Joined: 2/17/2012

"I did start out on a C64 and still think "$FF" looks better than "0xFF", but I get why the rest of the world disagrees :D"  - tinkeringdude

The '0' in front makes a lot of sense, because it allows it to "pass" as a number.  My command line interface parser also uses it to identify numbers.  Maybe the default should have been hex, with some special designator for decimal?

"...and I really hate all the #define magick madness needed to get anything done in C."

"Text replacement and text pasting (e.g. #include) is quite a crude mechanism..."

Mechanisms that work well in assembly, but really should not be part of a higher level language.  *.h files seem to be there only to resolve future references, so why can't the compiler be arsed to look ahead?  It does for other things...  

I'll probably make you vomit, and it's really stupid, but I try to write my C++ code so there are no future references, so I don't need to make *.h files.  Then again I don't do big projects.  I'm also extremely project file averse, so I use a ton of #include in my C++ code.  I have no choice but to make a project in Quartus for my SystemVerilog, but it's really just a list of files (the order of which is critical, which is idiotic) and the hierarchy is in the code.  I like compiling a single file and having it pull everything else in that it needs.  IMO languages should just work without needing some kind of meta level of new rules and syntax to manage them.

"What would you say are the benefits of a "real DSP" today, and where?"

DSPs are usually implemented as expensive co-processors, but I haven't looked at them in ages because FPGAs can do a lot more processing and do it much faster (and the board processor can be a soft or hard core in the FPGA).  And as you note, there are many DSP functions in modern GPU (general purpose processor) ALUs.  Back when I was somewhat interested in DSPs they were inexplicably slow compared to GPUs, running at 200MHz and the like.  

Also at the time I could never figure out why they didn't make blindingly fast (~4GHz) bare bones 8, 16, and 32 bit processors.  Like overly complex high level languages, target processors are way, way, way too complicated for what they do.  And I'm not just talking about the bloated Intel instruction set.  Things like security rings, caching, pipeline stalls, TLBs, supervisory modes & registers, and speculative out-of-order execution are coo-coo bananas.  Have a bunch of threads and put one on security detail or something (and maybe think about defunding the NSA).

Posted: 2/2/2018 1:32:47 AM

From: Northern NJ, USA

Joined: 2/17/2012

Phase Modulated Glottal Source

Looking at all the glottal waves I've been generating, it seemed to me that it would be more straightforward to simply modulate the NCO accumulator number before feeding it to the sine function.  If the modulated number still traverses the full 32 bit value field then one has reasonable guarantees that the resulting shape will be somewhat smooth and the amplitude full scale.  Spreadsheeting it, a square and higher value power pull to zero would give the glottal shapes I've been experimenting with.  Coded it up a couple of days ago and, other than the constant amplitude, it didn't seem all that much better than the others.  Today I made the power more continuous via LOG2, 5.27 multiplication, and EXP2.  And I noticed that for powers between 0.5 and 1.0 (not the ones I designed it for in the first place: 1.0 and higher) the vocal simulation sounded quite a bit better, particularly on the fry end of things.  Instead of the appearance of a hollowed out bump, the wave looks exactly like a linear ramp with rounded transitions:

This generator does it, but I'm also not convinced anymore that a perfect sine wave needs to exist somewhere in the harmonic adjustment, as one can easily get this with a low-ish harmonic signal fed into a tracking bandpass or lowpass filter, though it's a bit more fiddly to do so.

I'm also finally getting a bit more of the hang of playing the thing.  My "Sound of Music" is quite a bit better than my first video (that's not saying a lot, I basically butchered it).  The progress seems kind of sudden after a weeks of no obvious improvement (not that I practice much).

[EDIT] PM with a power of 0.25 as shown above sounds better on the fry end but also aliases more than the other approaches I've tried, and it's clear why that is if you look at the vertical rising edge and at the modulating waveform.  Hmm.

[EDIT2] I think I'm giving up on the PM approach.  Besides the aliasing issue, it has this nasty high-pitched buzz that I can't seem to ignore anymore.  I need to develop a variable decimal powers algorithm that doesn't break when asked to work with higher powers, and apply it to the HI-LO algorithm.  I can see why there are so many papers on the mathematics and subjective quality of various glottal excitation functions.

You must be logged in to post a reply. Please log in or register for a new account.