Let's Design and Build a (mostly) Digital Theremin!

Posted: 6/16/2012 10:09:14 PM
FredM

From: Eastleigh, Hampshire, U.K. ................................... Fred Mundell. ................................... Electronics Engineer. (Primarily Analogue) .. CV Synths 1974-1980 .. Theremin developer 2007 to present .. soon to be Developing / Trading as WaveCrafter.com . ...................................

Joined: 12/7/2007

"I'm redesigning the dumbest instrument ever made..."

LOL! - At times I have had that thought!  ;-)

"Saving the code was to a cassette tape for those young enough to remember. If you needed more RAM you would piggy back memory chips, solder one on top of another. I really thought my 2 MHz chip was fast! " - RST

Yeah Chris - those were the days! There are still a few of us left who think back on those days with some nostalgia.

As for : "To avoid latency in theremin response, I can imagine a revisit to machine language?" :

IMO, the latency problems one can get with digital theremin implementation are not primarily due to processor operations or bloated code, they are down to physics - There are things which can only be speeded up by having a faster clock (resolution when counting the interval between pulses, for example, is entirely down to the speed of the clock incrementing / decrementing the interval timer.. A 48MHz clock will give 1/4 the resolution of a 200MHz clock - and a 48MHz clock is only about 1/2 what one needs to start to get to acceptable resolution).. Back in the old days, 8MHz was fast, these days we do have fast enough hardware.

Where Assembler is, IMO, vital, is for handeling interrupts - One wants to get in and out of the interrupt extremely quickly, and take great care about priorotizing the interrupts.

"I never learned how to easily pass a variable between Assembly to C or Basic other than a push/pop to the stack.."

I found that the above becomes simple (particularly with C) when one understands how the compiler works.. The best compilers will output a listing which shows the generated assembly code - when one has this, you can easily see what the compiler is doing - and more importantly - see the stupid things it is doing! - One can then edit / modify the compiler generated .asm to speed it up, and get the best of all worlds.

"but apparently you don't get points for being a smarty-pants .." GordonC

LOL ! - Been there, and it doesnt just apply to college assignments - I was given a mind-numbingly boring job (my first programming job, in fact) - The company had about 10 products which all used a 6303 OPT MCU, and this part was discontinued - They opted for the H8 family of MCU's to replace it - my job was to translate all the asm code to move the firmware to the new MCUs..

I had just acquired a home pc (Amstrad 1512) and was teaching myself C - I suggested writing a program to automate the conversion, but was forbidden - my superior really didnt like me and couldnt get rid of me - he thought 6 months of translating asm would cause me to resign!

I wrote the conversion routine (took me 2 weeks at home), and fed it with the code - When I presented translated code for all 10 projects on the deadline date for the first project, I expected promotion.. Did I get it? Nah!

Fred.

 

Posted: 6/17/2012 9:34:27 AM
GordonC

From: Croxley Green, Hertfordshire, UK

Joined: 10/5/2005

4 Stacks

OK, now I see. Interesting thought. What would be the Forth way of doing it? Hmm.

Well, it's not unusual to hijack the return stack as a local variable within a colon definition. Step 1, push ToS to Top of Return Stack at start of definition. ( >R ) Step 2, access ToRS ( r@ ) when required. Step 3, dispose of ToRS by pushing ToRS to ToS and dropping ( R> DROP ) at end of definition (or just replacing the last occurrence of R@ with R>). R! could be useful too, equivalent to R> DROP >R, to replace the current value of the local variable with a new value.

So, yes, why not have a few local variable stacks, A B C D, with operators >A A@ A> A! >B B@ B> B! etc. There's scope there for compiler optimisations, but that would be the basic idea. 

It's certainly one way of using spare space on a processor chip, as onboard RAM. Devising fast memory management circuits to make optimal use of the onboard RAM for stacks could be fun. :-)

As a side note, Sun SPARCs and PPC based Macs have a blindingly fast Forth embedded deep inside them, called Open Firmware. Really fast despite being token threaded, because it's so small that it fits completely in the onboard processor RAM and rarely needs to access the much slower main memory. Allegedly therefore faster than raw assembler, because there's no practical way that you could hand-craft equivalent assembly code that compact.

Also parenthetically, another way to make use of the billions of transistors in a modern chip would be to fill it with hundreds, if not thousands, of tiny stack processors. One of my main influences in the Forth community was a chap who worked in safety critical systems - areas where your computer program has to work correctly all the time, or people will die! (Fly by wire aircraft control being one example. If the computer crashes, so does the plane.) One of his mantras was "One Process, One Processor." No interrupt based multitasking for him. Far too risky.

How cool would that be for processing an audio stream, just like having a network of effects pedals, each with its own processor, all running concurrently at top speed! 

 

Posted: 6/17/2012 3:14:53 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

"...another way to make use of the billions of transistors in a modern chip would be to fill it with hundreds, if not thousands, of tiny stack processors." -GordonC

I suppose you've seen this?  And this?  An article on the latter, where they explain how a couple of extra registers make it much more efficient to program.  The term "savage minimalism" (second page) is a good one!  For the life of me I don't have a clue as to how they are doing this asynchronously. 

Posted: 6/18/2012 10:00:43 AM
GordonC

From: Croxley Green, Hertfordshire, UK

Joined: 10/5/2005

I'm a bit behind when it comes to recent developments. Haven't really been following Forth since the last millennium. :-)

Posted: 6/19/2012 12:31:33 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

"I'm a bit behind when it comes to recent developments. Haven't really been following Forth since the last millennium. :-)" -GordonC

Laying in bed this morning I remembered an article I read in the late 70's about a Forth variant named IPS.  It ran on a RCA COSMAC 1802 aboard an amateur satellite.  It made a huge impression on me at the time.

http://www.amsat.org/amsat/projects/ips/

http://www.amsat-bda.org/IPS%20-%20The%20Book.pdf

Posted: 6/20/2012 9:31:16 AM
GordonC

From: Croxley Green, Hertfordshire, UK

Joined: 10/5/2005

Ha! RCA, "space control" and ham radio. So thereminy!

And, just like the theremin, good Forths takes some finding - they're protected by a thick wall of idiosyncratic home-brews to scare off the half-hearted. 

Oh, my. I'm going to have to stop thinking about Forth. An earlier, more obsessive version of me is stirring and it's going to end up with my buying a pro quality Forth for my Mac. (Coincidentally, the two good ones - by Forth Inc. in the US and MPE Ltd. in the UK - are both the same price as an etherwave standard - about $400.) 

OK, one last link. Here's an online book by someone I truly respect that brings a lot of what I've been saying right up to date. 

http://www.mpeforth.com/arena/ProgramForth.pdf

(Scoot down to page 191 - Chapter 21 Adopting and Managing Forth for a non-tech discussion of when Forth is the right choice and why.)

Posted: 6/20/2012 1:23:57 PM
FredM

From: Eastleigh, Hampshire, U.K. ................................... Fred Mundell. ................................... Electronics Engineer. (Primarily Analogue) .. CV Synths 1974-1980 .. Theremin developer 2007 to present .. soon to be Developing / Trading as WaveCrafter.com . ...................................

Joined: 12/7/2007

"An earlier, more obsessive version of me is stirring .." - GordonC

LOL! - Not sure about "more obsessive" - but otherwise I fully identify with your feeling!

The 1802 - Now that brings back some memories! One of the first "real" computer boards I owned was the COSMAC.

Its ability to clock right down to DC made it great for weird audio gadgets where one fed AF (or PLL multiplied AF) in as the clock...  LOL! ;-)

Fred.

Posted: 6/27/2012 10:18:22 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Just coming up for air.  Been down in the processor mines for so long I think I'm getting Potter's rot.

I'm really digging the 4 stack, 2 operand, 16 bit op-code, 16/32 bit data approach.  It's quite freeing compared to my past 2 stack, 0 operand, 5 bit op-code, 16 bit data attempts.  With no pipelining (1 MIPs / 1 MHz) I'm getting ~50 MHz / ~50% utilization in the target old & tiny & slow Xilinx Spartan 3 (XC3S2004FT256).  But FF utilization is 1%, which seems kind of wasteful.  Traditional pipelining has all kinds of hazzards and craziness associated with it, making verification really difficult, perhaps more difficult than designing the damn thing in the first place.

Multi-threaded is a possibility, with as many threads as there are pipeline stages - sort of a numerical Gatling gun.  If the stack pointers are constrained then a thread getting nailed by a stack fault wouldn't necessarily take down the others (though a rogue thread could still cause damage by overwriting data or program space).  All latencies are well hidden, and though this makes individual threads run slower the amalgam will most likely run faster and therefore theoretically process more data per MHz.

Moms, don't let your children grow up to be processor designers.

Posted: 7/10/2012 11:20:09 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Today I did my first build of the 4 threaded, 4 stage pipelined processor (the Gatling gun mentioned above) and I'm getting about double the clock frequency (~80 - 90 MHz) and the same utilization (~50% of the fabric logic) with much more balanced FF utilization (~50% of the FF's associated with the logic).  So I'm getting almost double the throughput in the same footprint, which is pretty neat.  Each of the four threads has its own interrupt and set of four stacks (32 bits wide, 64 deep) but shares the same program and data memory with the other threads - at reset / clear they start out at different addresses.  So if a thread stack faults (empty pop / full push) but hasn't mucked with memory it shouldn't take out the other threads, and can be independently cleared.  Pretty much like having four separate 20 mips processors that share a common memory.  This is actually kind of neat, because I'll be able to assign the threads to different tasks, and they can communicate via memory locations if need be.

So I'm verifying now which should take a while for such a complex structure.  But I'm a bit stymied by how a two operand instruction set for a four stack machine would be most intuitively implemented.  The ALU does most of it's heavy lifting on one main input (single operand operations, tests against zero, single shifts, etc.) and uses the other input only for two operand operations (add, sub, mult, and, or, xor, barrel shift distance, etc.).  I'm currently swapping ALU inputs based on opcode function, so that only the stack pointed to by the primary operand gets the result (push data)  but I'm not convinced this is the most direct and easily grasped architecture.  The last thing I want is something that's awkward to program at the lowest level, or to have a eureka moment several months into writing programs for it.

Posted: 7/11/2012 5:24:20 AM
FredM

From: Eastleigh, Hampshire, U.K. ................................... Fred Mundell. ................................... Electronics Engineer. (Primarily Analogue) .. CV Synths 1974-1980 .. Theremin developer 2007 to present .. soon to be Developing / Trading as WaveCrafter.com . ...................................

Joined: 12/7/2007

Hey Dewster -

This sounds like a mammoth undertaking! - Whatever - getting nearly double the throughput (about the only thing you said which I fully understand - LOL) certainly sounds worth doing!

" But I'm a bit stymied by how a two operand instruction set for a four stack machine would be most intuitively implemented." - Sorry, cant help you here! ;-) ... Are you writing your own interpreter or compiler? (I assume this is all being done in Forth or something like that)

Also - the "most intuitively implemented" - Are you looking at this from a perspective of others using it, or do you need to sort this out so that you can use it?

 >> Ps - Am I understanding correctly that you have actually implemented a "CPU" (as in, ALU, registers etc) using the CPLD - as in, you have effectively designed your own processor? If so, thats really impressive!

As for me - well, I just got a PSoC 5 development kit (I am a Cypress PSoC consultant - so get kit occassionally which I must play with in order to retain my "CYPros" status) - I hadnt really explored the new PSoC's until now (been using the old PSoC 1's - saw the 3's and 5's as a bit expensive) but I am really impressed with the PSoC 5 - now that they've (almost) sorted out their development environment.

For me, its the analogue functions (op-amps, filters, 4Q multipliers,switched capacitor stuff etc) which are neatest - can configure a full analogue subsystem with a few Rs and Cs connected to the pins - everything required to implement a quite complex (multiple waveshapes, register switching etc) analogue theremin is there.

Then there is a complete digital CPLD and an ARM processor.. One configures the device using schematic and/or Verilog entry, and can create re-usable "components" (digital, analogue, mixed subsystems complete with APIs etc) which one can then just wire together to build your system-on-chip.

Best of all (for me) is that with this small board, my laptop, and my small 'scope, I can actually develop without needing access to my lab or a soldering iron! - I cannot do the 'front-end' stuff, but can easily configure a couple of pins to produce dummy VFO and Ref signals, so I can play with everything else.

I think this will keep me busy for a few weeks ;-)

Fred.

You must be logged in to post a reply. Please log in or register for a new account.