Let's Design and Build a (mostly) Digital Theremin!

Posted: 5/5/2018 5:05:39 PM

From: Northern NJ, USA

Joined: 2/17/2012

Rocket Surgery

Yesterday, Hacker news (link) pointed to Akin's Laws of Spacecraft Design (link) and I was particularly struck by #4 and #20:

4. Your best design efforts will inevitably wind up being useless in the final design. Learn to live with the disappointment.

I've experienced this quite a lot in this project, there's not much you can do about it as many ultimately blind / not-applicable alleys must be followed to their logical conclusions.  Doing extra research slows the project down, but isn't always disappointing or painful (reading research papers can be extraordinarily fun) and you never really know where your next implementation idea will come from.

20. A bad design with a good presentation is doomed eventually. A good design with a bad presentation is doomed immediately.

I've given this quite a bit of thought lately and really have no idea as to how to handle it.  I want to show (via video, sound clips, etc.) what the prototype can do, but I know I'm not the best player (there's a reason they hire Elton John et al to play keyboard demos at NAMM) and I'm often demonstrating half-baked engineering results in a somewhat rushed manner.  I fear I'm turning a lot of potential future customers off, but what do you do?  I suppose I'm more interested in finding and presenting solutions to the various engineering stumbling blocks associated with digital Theremins & vocal synthesis than I am in making a big splash with the next iWhatever.  Though I do understand why companies keep products under wraps until the "black eye" phase has passed.

[EDIT] I'll just leave #1 here:

1. Engineering is done with numbers. Analysis without numbers is only an opinion.

Posted: 5/8/2018 4:34:11 AM

From: Northern NJ, USA

Joined: 2/17/2012

Aliasing Is Really Weird

I remember reading about aliasing in Hal Chamberin's book "Musical Applications of Microprocessors" and thinking he must have it wrong, or I must be reading it wrong, or something - the world can't be that crazy.  I figured "I'll just generate the wave, and if that gives me trouble I'll filter it, and if that gives me trouble the worst I'll have to do is generate it at 2x oversampling, filter it, and downsample."  The crazy harsh truth is you can't digitally generate anything but a sine wave and not have it alias all over the place without going to great lengths, because in a very real sense you are "sampling" a continuous wave in the sampled realm by the mere fact of generating it, even if at each sample you're doing everything perfectly.  There's no way to filter the aliasing away after you generate it, which is really counter-intuitive.  But there are mechanical ways (fractional filters, injecting gibbs phenomena type ringing, etc.) to ameliorate it, which is also really counter-intuitive.  Why do mechanical methods work but filtering methods don't?  

I don't know exactly, but wonder if it has something to do with the non-ideal behavior of the digital differentiator.  The digital integrator works as analog ones do, but the gain of the digital differentiator drops off near Nyquist, and then reverses afterward.  I've read people saying this, but I never really understood it until this morning.  When you have almost 1/2 a wave in the differentiator, the gain is dependent on the shape of the wave, which is a sine, which is non-linear.  And this, I believe, is why sine shows up as an error term in digital filter tuning.  DSP people must know these things, but they rarely just blurt them out.

On top of this, musically interesting waveforms tend to have a lot of harmonics, and they generally fall off slowly at 6dB/octave / 20dB/decade, so when you go for that >1kHz fundamental you get a pile of harmonics with significant energy hitting Nyquist (1/2 the sampling rate) and folding back down right into your lap.  And all of the solutions / ameliorations to this are fundamentally problematic in one or more ways.  They work, some better than others, requiring things like big tables in memory, odd calculations, strange thresholding/detection, switching in and out over various ranges, variable gain over a large range, leaky integrators, etc.  There's no "THAT'S IT!" solution.  Part of the strategy often seems to be picking the top note on the piano as the upper limit and calling it a day.

[EDIT] "There's no way to filter the aliasing away after you generate it" - this isn't true actually.  You can use a tracking comb filter, either FIR or IIR, to lower inter-harmonic aliasing, and a high pass filter can lower the more audible aliasing below the fundamental.  But comb filters take memory, and you need a fractional delay element in there, as well as a way of switching the delay without glitching.

Posted: 5/8/2018 9:46:04 PM

From: Northern NJ, USA

Joined: 2/17/2012

CIC Interpolation

I pretty much understand CIC decimation, which lowers the sampling rate.  The opposite of this is interpolation, which is used to "fill in the blanks" when increasing the sampling rate.  A CIC interpolator is a series of N differentiators, a zero-stuffing up-sampling "switch" which closes for one sample every R samples, followed by N integrators.

What was getting me was the switch and the first integrator following it. If you feed an integrator zeros it just sits there holding it's value (a "zero order hold" in the lingo) which means you can pull the first integrator back through the switch and run it at the lower source rate and get rid of the switch. And now the last differentiator and following integrator should cancel!  So an Nth order CIC interpolator should only require N-1 differentiators and integrators, and no zero stuffing switch.  Am I losing my mind?  There's no mention of this in Hogenauer's paper, which seems like a huge oversight if true.

Indeed, it is true: https://www.dsprelated.com/showthread/comp.dsp/368488-1.php

Many other hardware saving tricks like this in the fantastic paper "Reducing CIC Filter Complexity" by Ricardo A. Losada and Richard Lyons, IEEE SIGNAL PROCESSING MAGAZINE [124] JULY 2006.  Just more fundamental DSP stuff no one bothers to mention...

Posted: 5/13/2018 5:29:49 AM

From: Russia

Joined: 9/8/2016

Видео по теме;


Posted: 5/13/2018 1:49:28 PM

From: Northern NJ, USA

Joined: 2/17/2012

Shaken Harmonic Syndrome (Dither)

Thought I was onto something but it has issues that aren't easily surmountable.  The way I'm getting a spectrally pure square wave going to the antenna tanks is by applying one sample cycle's worth (amplitude) of white phase noise, or dither, to the phase accumulator (but we don't actually accumulate it because that would give us a Gaussian noise amplitude distribution rather than rectangular).  It was something of a mystery to me how this actually works but it's clear now that I'm investigating audio alias suppression.  To wit: harmonics at Nyquist (1/2 the sampling rate) "live" in exactly two samples (or cycles), and harmonics higher than this "live" in less than two samples.  Adding 1 cycle's worth of phase noise to the waveform causes everything "living" in two samples or less to average together.  However, as you might imagine, this raises the noise floor.  And higher fundamental tones have more harmonic energy at and above Nyquist, and so they have a higher noise floor when dithered.  Viewed another way, the added dither noise has to be scaled (multiplied) with the frequency in order to kill aliasing, and this increases the noise floor for higher fundamental tones.

On the audio side of things, using white dither I can generate a ~250Hz sawtooth that sounds pretty good, but above this the dither noise starts becoming obvious, until at 8kHz the tone is drowning in noise.  One could spectrally shape, or filter, the dither to be outside of the area of the most sensitive human audibility but still below Nyquist in order to make it less obvious.  One could also use a tracking high pass filter to suppress the noise below the fundamental.  One could certainly generate things at an oversampled rate, and the dither would scale down with the oversample ratio (OSR).  8kHz / 250Hz = 32 OSR which is a lot of calculations to do!  One could certainly combine milder OSR and noise shaping, the OSR region would give plenty of dither room above audio but below Nyquist, but then we're doing more calcs at a lower rate - no free lunch, it's all a trade-off, with some worse than others.

While researching dither I ran across the rather mis-named "subtractive dither."  Here one injects dither at one point, quantizes, and then subtracts the same dither downstream.  Something one can easily do within a system, but not so easily between systems as the dither signal would need to be replicated and synchronized.  To subtract the injected phase noise post NCO accumulator (quantization) one would need a linearized variable sub sample delay, and this element can be problematic.  It seems feedback (IIR forms) can't be used because the delay is changing quite dynamically with each sample, and higher orders based on spline interpolation are necessary to sufficiently reduce variable delay with frequency (group delay).

Which brings us more or less to the poly-BLEP method of alias reduction.  Here the naive saw edge is fractionally delayed based on the phase accumulator error, and Gibbs phenomena ringing is also injected, both of these via a short polynomial based spline FIR filter.  It sounds more complicated than it boils down to, but there's a fair amount of engineering involved behind the scenes.

Posted: 5/17/2018 12:17:59 AM

From: Northern NJ, USA

Joined: 2/17/2012

It Was Necessary To Destroy The Precision In Order To Save It

I read an article yesterday regarding the writing of early space software, where the processors were asthmatic and had no floating point hardware.  Seems they spent 30% of the time just managing precision, which is much better than my ratio!

For the last couple of days I've been trying to implement a toy NCO (numerically controlled oscillator) that employs a fractional delay to align the sawtooth edge, which happens at accumulator rollover, and the value in the accumulator at rollover gives the fractional delay if you divide it by the phase increment (normalize it).  So we need the reciprocal of the phase increment, which calls for the dreaded integer division, where precision basically goes to die.

Premature optimization, but I pared the Newton's method integer quotient and remainder subroutine down to give just give the reciprocal, which is 22 cycles max.  The precision issue raises its head when you feed it larger integers, which give very small fractional results.  Give it 32 bits and you get 0 bits, give it 0 bits and you get 32 bits, so the happy medium seems to be 16 bits, but it really depends on the range of the input data.

Given a 32 bit accumulator, to generate 32Hz at a 48kHz sampling rate we need a phase increment of (32 / 48k) * 2^32 = 2863311, which is 21.5 bits of info, taking the reciprocal of this gives 10.5 bits of info, and we have to take the worst (10.5 bits) here for the precision (garbage in/out).  To generate 8kHz the phase increment is (8k / 48k) * 2^32= 733007751, which is 29.5 bits of info, which means the reciprocal only has 2.5 bits of info!  Shifting the phase increment to the right 10 bits obviously throws 10 bits of input info away, but increases the minimum precision of the reciprocal.  Over the 32Hz to 8kHz range this shift gives a precision of 15.5 bits over the middle range and 12 bits at the extremes, which should be sufficient for this application.

[EDIT] So I used the above to reduce aliasing and it does work.  I can get a clean sounding sawtooth up to ~1.4kHz.  Need to try it with 8x oversampling. One nice thing about that is the reciprocal is a constant over the oversampled period.  Not sure where this is going as I really like the phase modulated sine wave approach, and I don't think this method of alias reduction adapts well to that.  I'd like a generic process that is continuous, just feed it anything and have it kill aliasing without looking for edges, but I'm not aware of any process that can do that.

[EDIT2] Here's the sawtooth NCO:

The frequency (phase increment) comes in and gets scaled to C9 max.  The upper path shifts it right 10 places to trade 1/x precision, then 1/x is called (unsigned).  The middle / lower path accumulates the phase increment, producing old and new values, which are compared (signed) to detect the sawtooth edge.  If so, 1/2 (2^31) is added to the new to make it unsigned, whereupon it is shifted right 10 places to match 1/x, then the two are multiplied together (regular, not extended multiplication, which is sign agnostic).  The resulting unsigned value is use to crossfade between the old and new NCO values, and the result is the output sawtooth waveform (signed).  When there isn't an edge the old and new get averaged together, which gives us a filter zero at Nyquist.

The NCO accumulation value can be seen as signed or unsigned, but you have to be consistent or it won't work (ask me how I know this).  As with PLLs, I get easily confused when it comes to "error" vs. "correction" signals.

Lately I'm coding up NCOs and commenting all but one out, and recording the audio of the variations in one audio file, comparing the sound, waveforms, and spectra in Audition, an arrangement which is working out well.  Otherwise it's hard to keep it all straight.

Posted: 5/19/2018 8:28:33 PM

From: Northern NJ, USA

Joined: 2/17/2012

Casio Patent

I suppose most people looking into this kind of stuff have seen the old Casio waveform synthesis patent: (link).

It's really simple and fairly ingenious.  They generate the usual NCO ramp and use it as cosine phase (via a ROM lookup).  But they modulo multiply the phase ramp so as to get more than one wave per period.  To kill discontinuities at the start and end (if the waves per period are not an integer multiple of the base period) they make the cosine unsigned and starting at zero, then they multiply it by the logically negated base ramp (to make the ramp fall rather than rise), or with an unsigned triangle formed from the base phase ramp.  With this they can get fairly what I think of as "molar" looking waves - humps with bites taken out of them, often associated with Theremin and formant stimulus.

[EDIT] I simulated some of the Casio waveform synthesis in an Excel spreadsheet, both triangular and raised cosine AM:

Top: The waveform (heavy black line) and NCO phase (thin red line) are shown for a fundamental of 500Hz with 1.9 cycles per cycle, triangular AM.  Note the nice "molar" shape.  Unlike in the patent, I'm using a signed sine wave as the base wave, rather than unsigned cosine, which mostly gets around the DC offset issue.

Bottom: Resulting FFT (2048 points with triangular window).  Note the suppression of even harmonics starting at the 4th.

I played around with phase offset but it didn't seem to make much difference to the FFT.  Raised cosine AM makes the harmonic amplitudes more uniform - perhaps more boring?  Now to try this on the prototype and see what it sounds like...

Posted: 5/21/2018 2:53:46 PM

From: Northern NJ, USA

Joined: 2/17/2012

Casio Patent - continued

OK, coded up the phase & amplitude modulation (PM & AM) techniques in that patent and tried them out on the prototype.  All three sound like a tracking filter, with increasing number of cycles per cycle (the cycle multiplier) sounding like stronger (higher Q) tracking.  

To my ears, raised cosine AM is the least musically interesting.  It's really smooth sounding, with higher harmonics diving to zero.  You can get the first two harmonics and nothing else if you want that; or the first three; or the second, third, and fourth and nothing else.  Settings in-between give the rest of the harmonics, but their amplitudes are kind of low.

Reversed ramp AM could be fairly useful if you didn't otherwise have a tracking filter, as it moves the emphasis (harmonic peak) from the fundamental to the higher harmonics.

Triangular AM gives the most variety of sounds.  Setting the phase multiplier to less than one gives all harmonics falling off at a fairly quick but even rate, it could probably be brightened with a filter and used to stimulate vocal formants.  Setting it really low gives odd harmonics.  Setting it to 1.5 gives rather human sounding vocals without formant filtering.


Rethinking C9 Max

The top octave on the prototype ends on C9, which is 8.372 kHz.  This is one octave higher than a piano goes and it's a tough octave from a couple of synthesis angles:  It's obviously really hard to eliminate aliasing that close to Nyquist, and it limits the lower end of the Q range for the second order filter I'm using.  So I'm thinking of lopping it off and going with C8 max.

You must be logged in to post a reply. Please log in or register for a new account.