## Encyclopedia > Fast Fourier Transform

Article Content

# Fast Fourier Transform

A Fast Fourier Transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. It is of great importance to a wide variety of applications, from digital signal processing to solving partial differential equations to algorithms for quickly multiplying large integers. This article describes the algorithms; see discrete Fourier transform for properties and applications of the transform.

Let x0, ...., xn-1 be complex numbers. The DFT is defined by the formula

$f_j = \sum_{k=0}^{n-1} x_k e^{-\frac{2\pi i}{n} jk } \qquad j = 0,\dots,n-1.$

Evaluating these sums directly would take O(n2) arithmetical operations (see Big O notation). An FFT is an algorithm to compute the same result in only O(n log n) operations.

Since the inverse DFT is the same as the DFT, but with the sign of the exponent flipped and a 1/n factor, any FFT algorithm can easily be adapted for it as well.

The most common FFT is the Cooley-Tukey algorithm. In its most basic form, this method first computes the Fourier transforms of the even-indexed numbers x0, x2,...,xn-2, and of the odd-indexed numbers x1, x3,....,xn-1, and then combines those two results to produce the Fourier transform of the whole sequence. This idea can then be performed recursively to reduce the overall runtime to O(n log n). This simplified form assumes that n is a power of two; since the number of sample points n can usually be chosen freely by the application, this is often not an important restriction.

We write n' = n/2 and denote the discrete Fourier transform of the even indexed numbers x'0 = x0, x'1 = x2,....,x'n'-1 = xn-2 by f0',...,f 'n'-1, and the transform of the odd indexed numbers x''0 = x1,x''1 = x3,...,x''n'-1 = xn-1 by f0'', ..., f ''n'-1. Then it follows:

$\begin{matrix} f_j & = & \sum_{k=0}^{\frac{n}{2}-1} x_{2k} e^{-\frac{2\pi i}{n} j(2k)}  + \sum_{k=0}^{\frac{n}{2}-1} x_{2k+1} e^{-\frac{2\pi i}{n} j(2k+1)} \\ \\  & = & \sum_{k=0}^{n'-1} x'_{k} e^{-\frac{2\pi i}{n'} jk}  + e^{-\frac{2\pi i}{n}j} \sum_{k=0}^{n'-1} x_k e^{-\frac{2\pi i}{n'} jk} \\ \\  & = & \left\{ \begin{matrix}  f'_j + e^{-\frac{2\pi i}{n}j} f_j & \mbox{if } j<n' \\ \\   f'_{j-n'} - e^{-\frac{2\pi i}{n}(j-n')} f_{j-n'} & \mbox{if } j \geq n' \end{matrix} \right.  \end{matrix}$

A form of this trick, including its recursive application, was already known around 1805 to Carl Friedrich Gauss, who used it to interpolate the trajectories of the asteroids Pallas and Juno, but was not widely recognized (being published only posthumously and in Latin); Gauss did not analyze the asymptotic complexity, however. Various limited forms were also rediscovered several times throughout the 19th and early 20th centuries. FFTs became popular after J. W. Cooley of IBM and J. W. Tukey of Princeton published a paper in 1965 reinventing the algorithm and describing how to perform it conveniently on a computer (including how to arrange for the output to be produced in the natural ordering).

This process is an example of the general technique of divide and conquer algorithms; in many traditional implementations, however, the explicit recursion is avoided, and instead one traverses the computational tree in breadth-first fashion.

The above re-expression of a size-n DFT as two size-n/2 DFTs is sometimes called the Danielson-Lanczos lemma, since the identity was noted by those two authors in 1942 (influenced by Runge's 1903 work). They applied their lemma in a "backwards" recursive fashion, repeatedly doubling the DFT size until the transform spectrum converged (although they apparently didn't realize the logarithmic asymptotic complexity they had achieved). The Danielson-Lanczos work predated widespread availability of computing machines and required hand calculation; they reported a computation time of 140 minutes for a size-64 DFT operating on real inputs (see below) to 3-5 significant digits. Cooley and Tukey's 1965 paper reported a running time of 0.02 minutes for a size-2048 complex DFT on an IBM 7094 (probably in 36-bit single precision, ~8 digits). Rescaling the time by n log n, this corresponds roughly to a speedup factor of around 800,000. The more modern FFT library FFTW, on a 2GHz Pentium-IV in 64-bit double precision (~16 digits), can compute a size-64 real-input DFT in 1μs and a size-2048 complex DFT in 100μs, speedups of about 8,000,000,000 and 10,000 over Danielson & Lanczos and Cooley & Tukey, respectively, not even including the considerable improvements in accuracy.

(140 minutes for size 64 may sound like a long time, but it corresponds to an average of at most 16 seconds per floating-point operation, around 20% of which are multiplications...this is a fairly impressive rate for a human being to sustain for almost two and a half hours, especially when you consider the bookkeeping overhead.)

#### Variations on the Cooley-Tukey algorithm

More generally, Cooley-Tukey algorithms recursively re-express a DFT of a composite size n = n1n2 as n1 DFTs of size n2, followed by multiplication by complex roots of unity called twiddle factors, followed by n2 DFTs of size n1. Typically, either n1 or n2 is a small factor, called the radix. If n1 is the radix, it is called a decimation in time (DIT) algorithm, whereas if n2 is the radix, it is decimation in frequency (DIF, also called the Sande-Tukey algorithm). The version presented above is a radix-2 DIT algorithm; in the final expression, the phase multiplying the odd transform is the twiddle factor, and the +/- combination (butterfly) of the even and odd transforms is a size-2 DFT. (The radix's small DFT is sometimes known as a butterfly, so-called because of the shape of the data-flow diagram for the radix-2 case.) Gauss used a radix-3 DIT (or radix-4 DIF) step in a 12-point DFT example.

There are many other variations on the Cooley-Tukey algorithm. Mixed-radix implementations handle composite sizes with a variety of (typically small) factors in addition to two, usually (but not always) employing the O(n2) algorithm for the prime base cases of the recursion. Split-radix merges radices 2 and 4, exploiting the fact that the first transform of radix 2 requires no twiddle factor, in order to achieve a minimal operation count for power-of-two sizes. (On present-day computers, performance is determined more by cache and CPU pipeline considerations than by strict operation counts; well optimized FFT implementations often employ larger radices and/or hard-coded base-case transforms of significant size.) Another way of looking at the Cooley-Tukey algorithm is that it re-expresses a size n one-dimensional DFT as an n1 by n2 two-dimensional DFT (plus twiddles), where the output matrix is transposed. The net result of all of these transpositions, for a radix-2 algorithm, corresponds to a bit reversal of the input (DIF) or output (DIT) indices. If, instead of using a small radix, one employs a radix of roughly √n and explicit input/output matrix transpositions, it is called a four-step (also six-step) algorithm, initially proposed for cache/locality optimization (Gentleman and Sande, 1966).

The general Cooley-Tukey factorization rewrites the indices j and k as j = n2 j1 + j2 and k = n1 k2 + k1, respectively, where the indices ja and ka run from 0..na-1 (for a of 1 or 2). That is, it re-indexes the input (k) and output (j) as n1 by n2 two-dimensional arrays in column-major and row-major order, respectively. When this reindexing is substituted into the DFT formula for jk, the n2j1n1k2 cross term vanishes (its exponential is unity), and the remaining terms give

$f_{n_2 j_1 + j_2} =  \sum_{k_1=0}^{n_1-1} \left[ e^{-\frac{2\pi i}{n} j_2 k_1 } \right] \left( \sum_{k_2=0}^{n_2-1} x_{n_1 k_2 + k_1} e^{-\frac{2\pi i}{n_2} j_2 k_2 } \right) e^{-\frac{2\pi i}{n_1} j_1 k_1 } $

where the inner sum is a DFT of size n2, the outer sum is a DFT of size n1, and the [...] bracketed term is the twiddle factor.

The 1965 Cooley-Tukey paper noted that one can employ an arbitrary radix r (as well as mixed radices), but failed to realize that the radix butterfly is itself a DFT that can use FFT algorithms. Hence, they reckoned the complexity to be O(r2 n/r logrn), and erroneously concluded that the optimal radix was 3 (the closest integer to e). (Gauss also derived the algorithm for arbitrary radices, and gave explicit examples of both radix-3 and radix-6 steps.)

In many applications, the input data for the DFT are purely real, in which case the outputs satisfy the symmetry

$f_{n-j} = f_j^*,$
and efficient FFT algorithms have been designed for this situation. One approach consists of taking the ordinary Cooley-Tukey algorithm and removing the redundant parts of the computation, saving roughly a factor of two in time and memory. Alternatively, it is possible to express an even-length real-input DFT as a complex DFT of half the length (whose real and imaginary parts are the even/odd elements of the original real data), followed by O(n) post-processing operations.

It was once believed that real-input DFTs could be more efficiently computed by means of the Discrete Hartley transform (DHT), but this was subsequently disproved: a specialized real-input DFT algorithm (FFT) can typically be found that requires fewer operations than the corresponding DHT algorithm (FHT) for the same number of inputs.

There are further FFT specializations for the cases of real data that have even/odd symmetry, in which case one can gain another factor of ~2 in time/space and the DFT becomes the discrete cosine/sine transform(s) (DCT/DST). Instead of directly modifying an FFT algorithm for these cases, DCTs/DSTs can also be computed via FFTs of real data combined with O(n) pre/post processing.

All of the FFT algorithms discussed so far compute the DFT exactly (in exact arithmetic, i.e. neglecting floating-point errors). A few "FFT" algorithms have been proposed, however, that compute the DFT approximately, with an error that can be made arbitrarily small at the expense of increased computations. Such algorithms trade the approximation error for increased speed or other properties. For example, an approximate FFT algorithm by Edelman et al. (1999) achieves lower communication requirements for parallel computing with the help of a fast-multipole method. A wavelet-based approximate FFT by Guo and Burrus (1996) takes sparse inputs/outputs (time/frequency localization) into account more efficiently than is possible with an exact FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995). Only the Edelman algorithm works equally well for sparse and non-sparse data, however, since it is based on the compressibility (rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data.

Even the "exact" FFT algorithms have errors when finite-precision floating-point arithmetic is used, but these errors are typically quite small; most FFT algorithms, e.g. Cooley-Tukey, have excellent numerical properties. The upper bound on the relative error for the Cooley-Tukey algorithm is O(ε log n), compared to O(ε n3/2) for the naive DFT formula (Gentleman and Sande, 1966), where ε is the machine floating-point relative error. In fact, the average errors are much better than these upper bounds, being only O(&epsilon √log n) for Cooley-Tukey and O(ε √n) for the naive DFT (Schatzman, 1996). These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT (i.e. the trigonometric function values), and it is not unusual for incautious FFT implementations to have much worse accuracy, e.g. if they use unstable trigonometric recurrence formulas. Some FFTs other than Cooley-Tukey, such as the Rader-Brenner algorithm, are intrinsically less stable.

#### References

• James W. Cooley and John W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput. 19, 297-301 (1965).
• Carl Friedrich Gauss, "Nachlass: Theoria interpolationis methodo nova tractata," Werke band 3, 265-327 (Königliche Gesellschaft der Wissenschaften, Göttingen, 1866). See also M. T. Heideman, D. H. Johnson, and C. S. Burrus, "Gauss and the history of the fast Fourier transform," IEEE ASSP Magazine 1 (4), 14-21 (1984).
• G. C. Danielson and C. Lanczos, "Some improvements in practical Fourier analysis and their application to X-ray scattering from liquids," J. Franklin Inst. 233, 365-380 and 435-452 (1942).
• P. Duhamel and M. Vetterli, "Fast Fourier transforms: a tutorial review and a state of the art," Signal Processing 19, 259-299 (1990).
• W. M. Gentleman and G. Sande, "Fast Fourier transforms—for fun and profit," Proc. AFIPS 29, 563-578 (1966).
• H. Guo, G. A. Sitton, and C. S. Burrus, "The Quick Discrete Fourier Transform," Proc. IEEE Conf. Acoust. Speech and Sig. Processing (ICASSP) 3, 445-448 (1994).
• G. Bruun, "z-transform DFT filters and FFTs," IEEE Trans. Acoust. Speech Sig. Processing ASSP-26, 56-63 (1978). Also H. Murakami, "Real-valued fast discrete Fourier transform and cyclic convolution algorithms of highly composite even length," Proc. ICASSP 3, 1311-1314 (1996).
• C. M. Rader, "Discrete Fourier transforms when the number of data samples is prime," Proc. IEEE 56, 1107-1108 (1968).
• L. I. Bluestein, Northeast Electronics Res. and Eng. Meeting Record 10, 218-219 (1968). Also D. H. Bailey and P. N. Swarztrauber, "The fractional Fourier transform and applications," SIAM Review 33, 389-404 (1991).
• H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus, "Real-valued fast Fourier transform algorithms," IEEE Trans. Acoust. Speech Sig. Processing ASSP-35, 849-863 (1987).
• A. Edelman, P. McCorquodale, and S. Toledo, "The Future Fast Fourier Transform?" SIAM J. Sci. Computing 20, 1094-1114 (1999).
• H. Guo and C. S. Burrus, "Fast approximate Fourier transform via wavelets transform," Proc. SPIE Intl. Soc. Opt. Eng. 2825, 250-259 (1996).
• O. V. Shentov, S. K. Mitra, U. Heute, and A. N. Hossen, "Subband DFT. I. Definition, interpretations and extensions," Signal Processing 41 (3), 261-277 (1995).
• James C. Schatzman, "Accuracy of the discrete Fourier transform and the fast Fourier transform," SIAM J. Sci. Comput. 17 (5), 1150-1166 (1996).
• Matteo Frigo and Steven G. Johnson: FFTW, http://www.fftw.org/. A free (GPL) C library for computing discrete Fourier transforms in one or more dimensions, of arbitrary size, using Fast Fourier Transforms. Also M. Frigo and S. G. Johnson, "FFTW: An adaptive software architecture for the FFT," Proc. ICASSP 3, 1381-1384 (1998).

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia
 Search over one million articles, find something about almost anything!

Featured Article
 Fahrenheit ... His measurements were not entirely accurate, though; by his original scale, the actual freezing and boiling points would have been slightly different than 32 and 212. ...