CRC-32
Been looking into error detection for software loads. The usual approach is to do a CRC over the data, particularly as a check to see if the software load in Flash is valid before using it.
Not surprisingly, engineers, programmers, and mathematicians had their mutually inconsistent ways with things, so it can take a day or more to come to grips with what's going on. It's really more obfuscated than complicated, and the fact that hex dumps and C strings have a rather jumbled display order just adds to the fun. Surprisingly, the standard CRC-32 polynomial is sub-optimal, it doesn't even flag all odd numbers of errors. Koopman performs exhaustive computational searches for better polynomials and apparently hasn't finished for CRC-32. I find it incredible that the world relies on CRC for so much yet Koopman is doing this in his off time. If anything should get funding it's this kind of stuff.
The clearest explanation I could find was in Hacker's Delight, 2nd edition, though the simplest implementation is not shown in code form there. Warren's hardware view sidesteps all the endian nonsense and language ambiguity, though his diagram is that of a left shifting, non flipped CRC & residue type. For me (granted, a HW engineer) it helped to initially approach CRC implementation as an LFSR-based serial data scrambler, rather than a byte and table (or no table) arrangement. The concept of parallel input such as bytes can then be pulled in later, but all the byte and/or word flipping can be confusing without an understanding of the underlying serial process, which has nothing to do with bytes, just bits and 32 bit values. The byte & table approach is just a bunch of precomputed xoring, and a hardware implementation of the table could be easily replaced with a sea of xor gates, which conveniently factors down to something fairly manageable.
Excel spreadsheet: http://www.mediafire.com/download/yxfyu871wf4yb08/CRC32_2015-11-20.xls
I wrote a Hive subroutine today that does one round on 32 bit input data. 5 cycles through the loop with one loop per bit. I'll probably use this with the SPI Flash device that will be holding the software load and presets:
ADDR OC SA SB IM OP Pseudo code Comments 0x100 0xc11 s1 s1 . LIT s1 := -306674912 -- SUB : CRC32 - poly : 0xedb88320 0x101 0x8320 . . -31968 L 0x102 0xedb8 . . -4680 L 0x103 0xb1f2 s2 . 31 BYT s2 := 31 -- loop idx : 31 0x104 0xaffa P2 . -1 ADD_8 P2 += -1 -- loop start, dec idx 0x105 0x1c00 s0 s0 . SK2_O (s0==odd) ? pc+=2 0x106 0x97f8 P0 . -1 SHP_6U P0 <<= -1 (u) 0x107 0x402 . . 2 JMP_8 pc += 2 0x108 0x97f8 P0 . -1 SHP_6U P0 <<= -1 (u) 0x109 0x3718 P0 s1 . XOR P0 ^= s1 0x10a 0xff92 s2 . -7 JMP_8NLZ (s2!<0) ? pc += -7 0x10b 0x106 . . 6 POP P2 P1 0x10c 0x2df0 s0 P7 . GTO pc := P7