32/16/8 Float
I'm going through my various floating point math subroutines and optimizing them for the unpacked float type under consideration. The float consists of a 32 bit unsigned normalized (MSb=1) magnitude with decimal place to the left of the MSb, a 16 bit signed non-offset exponent (power of 2), and an 8 bit sign (1 or -1), all stored in separate processor registers / memory slots. Forcing the output of these subroutines to produce normalized results, plus a very well defined non-norm zero (MAG=0, EXP=-0x8FFF, SGN=1) means a lot of the up-front processing can be removed. I'm hoping that allowing the exponent to "roam free" in the 32 bit space (in calculations between memory storage and retrieval) will reduce the cycles needed to implement the math I need to do, and I have a specific "lim_f" subroutine that reigns it back in to 16 bits when desired (7 cycles max). Floating point multiplication now takes 8 cycles max, and floating point addition takes 16 cycles max. I wrote functions that convert floats to / from signed ints, and they take 9 and 8 cycles max., respectively. Onward and upward!