"Are you writing your own interpreter or compiler? (I assume this is all being done in Forth or something like that)
Also - the "most intuitively implemented" - Are you looking at this from a perspective of others using it, or do you need to sort this out so that you can use it?" - FredM
Verilog lets you specify the contents of arrays such as block RAMs. It also allows you to declare parameters and have them represent values, so the opcode name can be the actual value. I'm hoping to keep it simple enough so that i can program it in verilog this way (I've done this before with previous processor designs, though not extensively).
I need to sort this out so that doing the above isn't overly difficult.
The opcode is 16 bits: 2 bits specify which 1 of 4 stacks is the "primary" P operand for the operation; 2 bits specify the "secondary" S operand; one bit each for the primary and secondary operand stack pop, and the remaining 10 bits specify the operation.
By default, writes (the result of an operation) cause an automatic push, but reads (the input(s) to that operation) don't cause an automatic pop. If you want to consume inputs P and/or S during the operation you set their associated pop bits.
For a 2 operand operation, say subtraction, it currently works like this:
P <= P-S
So that P is the only thing ever written to.
For single operand operations, I have the following choices:
P <= P
P <= S
S <= P
S <= S
The first is conceptually (IMO) most consistent with the two operand operation, ALU input and output decode is the simplest, but it doesn't allow for a move during the operation. The second is still fairly consistent, and allows for a move, but ALU input decoding is more complex. The third allows for a move, but is inconsistent in the sense that we are writing to S, and ALU output decode is more complex. The fourth is nonsense.
If I go with 1 above I lose the optional move, which seems rash. If I go with 3 I could make the two operand case S <= P-S, but this is inconsistent with reads, writes, jumps, etc. where P is the data being written/read/tested and S is the address. If I stick with 2 I feel dirty ;-)
This is the problem with DIY processors: a general lack of silver bullets, and a dearth of inherent beauty / symmetry - which one usually uses in order to gauge when to quit tinkering and move on already. IMO, most designers stop too early (or punt) and you end up hanging bags on the side (caches, coprocessors, etc.) that only compound the complexity.
With his "Everything should be made as simple as possible, but no simpler" adage, it's too bad Einstein wasn't a processor designer.