Implementation of Floating-Point Arithmetic Coding Using x86-64 AVX-256 Assembly Language
Abstract
Mike H.B. Gray
Bit manipulations, especially those executed on multiple strings in parallel, e.g., on Intel® processors equipped with Advanced Vector Extensions (AVX), can be a powerful way to speed up unoptimized high-level sequentially executed code. A case in point is made for floating-point arithmetic coding (FPAC), implemented herein as a non-adaptive, lossless data compression algorithm using x86 AVX-256 stand-alone assembly language under 64-bit MASM assembler in Visual Studio 2022. Apart from writing and reading bit strings to and from file, FPAC can become fully vectorized and be improved in performance (relative to unoptimized integer versions) by orders of magnitude by blocking short sequences of symbols and bypassing interval renormalization. For an alphabet size, up to 53—the limiting case made for 0.474MB Protein Data Bank entry 4HHB, referred to as oxygen transport file (OTF)—it can also strongly outperform a commercially available, unoptimized C++ Huffman encoder by over a factor of 10 and beat the decoder by roughly a factor of 2. Disadvantageous but necessary to this prescription of vectorizations is an additional compressed storage requirement of the length of the codeword (this binary integer is to encode a block of 5 symbols) in addition to the codeword itself; for size-5 blocks, this compromises the compression efficiency as follows: the average number of bits per symbol required to compress the input message is demonstrated to lie in the interval [H(S) + b, H(S) + 0.4 + b], where b = 1.0 for single-precision floating-point arithmetic coding (SPFPAC), b = 1.2 for a slower but more practical double-precision counterpart (DPFPAC), and H(S) is the Shannon entropy of symbol frequencies.