Design of Scalable Architecture for Real-Time Parallel Computation of Long to Ultra-Long Real-Data DFTs
Abstract
Keith Jones
With the current trend in large scale, big data applications, there is an increasing need for the design and efficient implementation of long to ultra-long Fourier-based transform algorithms, such as with fast Fourier transforms (FFTs) where the transform length varies from long (of order one million) up to ultra-long (of order one billion). To address such problems, the paper shows how the applicability of the scalable, memory-based architecture of the regularized fast Hartley transform (FHT) or RFHT – which has proved an attractive alternative to the FFT for the computation of the discrete Fourier transform (DFT) for when the data is real-valued, as is the case with many real-world applications – may be straightforwardly extended to enable the efficient parallel computation of long to ultra long transforms to be achieved and maintained in a continuous real-time fashion. In order to implement such algorithms, however, when using the memory-based architecture of the RFHT, a timing constraint (and hence transform size limitation) arising from the combined effects of the update period and the I/O rate needs to be overcome and the formidable data and coefficient (or twiddle factor) memory requirement minimized. With this in mind and with a processing element (PE) defined as comprising one complete RFHT module, it is seen how the design of a scalable dual-PE architecture may be derived as a simple extension of the single - PE case – thus possessing a number of attractive properties, as held by the RFHT, but not by pipelined real-data FFT implementations – this being achieved in such a way that the transform size limitation is overcome and, when combined with the use of memory-efficient multi level look-up table (LUT) techniques for the coefficient generation and storage, offers the ‘potential’ for achieving and maintaining the real - time parallel computation of real-data transforms where the transform length may now range up to one billion. Two hypothetical implementations are briefly discussed which illustrate how the dual-PE solutions for the computation of both the one and four million-point real-data transforms may each be mapped onto a single commercially-available field programmable gate array (FPGA) device using only fast on - chip random access memory (RAM) for the data and coefficient storage.