I found signed saturation and unsigned limiting to be extremely valuable, to the point where it needed to be done in processor HW for speed and economy. In the end I used the extra BRAM bits they give you for ECC and such to hold flags that could then be decoded by the final mux to provide min/max. This is from my multiply / shift / rotate unit:
It's really interesting idea for loading of unused 4 bits of BRAM with something useful.
I was thinking about increasing of instruction and data length by 2 or 4 ECC bits. But for memory, it can be used only if memory size is 2-4BRAM (depending on platform).
But since register file BRAM is small anyway, ECC bits are available.
Do you support byte-precise addressing and unaligned data in hieve?
BTW, I don't have stack support in my softcore architecture. Return address is saved to any register, and jump to value from any register may work as return (or even conditional return).
I'm avoiding stack instructions because POP instruction cannot follow rule "one instruction per cycle, only one register may be written per cycle". Pop requires saving of value from memory to register and save incremented/decremented stack pointer value.
Instead of incrementing/decrementing stack pointer on each pop/push, we can update it once per procedure call to reserve stack frame, and then address procedure parameters and local variables by addressing relative to stack frame pointer register.
When relation between procedure calls (call tree) is known, and if we have a lot of registers, we may assign non-overlapping registers as link registers for different procedures.
It took me a long time to come around to the fact that byte addressing is super valuable, and that variable length instructions - which is enabled by byte addressing - is also super valuable. It's true that there are potential sync issues: you have to go back to the beginning and travel to the execution point to really know what the opcode is, but it's more than offset by the advantages IMO.
Hmm. Doesn't it require additional cycles if you support non-word aligned reads/writes and variable length instructions - it may take omre than one cycle to read instruction.
If prefetch is used - more than one cyle may be required on conditional jumps.
Meanwhile, I'm trying to find suitable FPGA board / module to use in digital theremin.
Found very interesting devices from german manufacturer Trenz Electronics: TE0725.
It's small Artix-7 based easy to solder board.
Power supply is 3.3V.
Status is Full Production, but it's not in stock. "Possible to order, delivery time on request."
Dual 2x25 easy to solder 100mil headers. Each of them supports has support 42 single ended or 21 LVDS pairs.
Each header is connected to its own HR I/O bank (bank 34, bank 35), and can have its own I/O voltage (3.3V from onboard regulator or other voltage from external supply).
E.g. one of headers may be powered from 2.5V (requires external regulator) providing full LVDS support, and outputs with voltage 2.5V can be connected to inputs of 3.3V devices.
On board 32MB flash, 8MB HyperRAM.
JTAG header and serial interface for programming / debugging.
TE0790 JTAG programmer board may be used for USB JTAG+Serial
There are several configurations - with different Xilinx Artix-7 chips:
XC7A15T-1CSG324C (10K LUT6, 20K FFs, 45 DSPs, 25 BRAM, 5 PLL) - EUR 69 - useless configuration - same price A35T, but less power, slower speed grade (-1 instead of -2)
XC7A35T-2CSG324C (20K LUT6, 40K FFs, 90 DSPs, 50 BRAM, 5 PLL) - EUR 69 - looks like optimal
XC7A100T-2CSG324C (60K LUT6, 120K FFs, 240 DSPs, 135 BRAM, 6 PLL) - EUR 134 x3 times more power for x2 bigger price
LUT6 can often replace two smaller LUTs, with separate FF on each half.
Very nice, but...
Mouser reports expected manufacturing time as Summer 2023.
Only chineese boards may save us
Sipeed Tang Nano 4K with 8MB of embedded SDRAM and embedded ARM hardcore, 4K LUTs. EUR 18
Sipeed Tang Nano 9K with 8MB of embedded SDRAM, 8K LUTs EUR 20
Sipeed Tang Primer 20K, with 20K LUTs and big on-board SDRAM EUR 28
QMTECH ZYJZGW Xilinx Zynq7000 Zynq XC7Z010 SoC FPGA Starter Kit Development Board on aliexpress for EUR 66
I believe it's not the same board as mentioned by Joel as having decoupling issues - link from Joel's list is outdated, and as I remember, it was different layout.
What about x10 times more LUTs, 300K?
QMTECH Xilinx FPGA Kintex7 Kintex-7 XC7K325T DDR3 Core Board from aliexpress EUR 133
x30 times more resources than D-Lev currently have.
Unfortunately, it's not supported by free version of Vivado (up to Kintex XC7K160T only)
Ok, here is 200K LUT board
QMTECH Xilinx FPGA Artix7 Artix-7 XC7A200T DDR3 Core Board on aliexpress EUR 133
But this one is supported - is biggest Artix chip supported for free.
Any crazy idea should fit.
These boards may disappear from market at any time.
Is there something suitable non-chinese in stock from brands?
Z-Turn Board
MYS-7Z020-C-S mouser: 11 in stock EUR 175.5
Good but expensive Zynq 7010 board, with 50mil pin sockets below.
(or it's chinese, too?)
Cmod A7 Artix-7 Module: In stock everywhere. Farnell price EUR 87
DIP form factor with 44 digital I/O pins and 2 analog input pins, powerful 35K LUTs FPGA, but only 512K of external SRAM chip.