Encyclopedia > Fast Fourier Transform

Article Content

Fast Fourier Transform

A Fast Fourier Transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. It is of great importance to a wide variety of applications, from digital signal processing to solving partial differential equations to algorithms for quickly multiplying large integers. This article describes the algorithms; see discrete Fourier transform for properties and applications of the transform.

Let x₀, ...., x_n-1 be complex numbers. The DFT is defined by the formula

<math> f_j = \sum_{k=0}^{n-1} x_k e^{-\frac{2\pi i}{n} jk }

\qquad j = 0,\dots,n-1. </math>

Evaluating these sums directly would take O(n²) arithmetical operations (see Big O notation). An FFT is an algorithm to compute the same result in only O(n log n) operations.

Since the inverse DFT is the same as the DFT, but with the sign of the exponent flipped and a 1/n factor, any FFT algorithm can easily be adapted for it as well.

Table of contents

1 The Cooley-Tukey algorithm

1.1 Variations on the Cooley-Tukey algorithm

2 Other FFT algorithms

3 FFT algorithms specialized for real and/or symmetric data

4 Accuracy and approximations

4.1 References

The Cooley-Tukey algorithm

The most common FFT is the Cooley-Tukey algorithm. In its most basic form, this method first computes the Fourier transforms of the even-indexed numbers x₀, x₂,...,x_n-2, and of the odd-indexed numbers x₁, x₃,....,x_n-1, and then combines those two results to produce the Fourier transform of the whole sequence. This idea can then be performed recursively to reduce the overall runtime to O(n log n). This simplified form assumes that n is a power of two; since the number of sample points n can usually be chosen freely by the application, this is often not an important restriction.

We write n' = n/2 and denote the discrete Fourier transform of the even indexed numbers x'₀ = x₀, x'₁ = x₂,....,x'_n'-1 = x_n-2 by f₀',...,f '_n'-1, and the transform of the odd indexed numbers x''₀ = x₁,x''₁ = x₃,...,x''_n'-1 = x_n-1 by f₀'', ..., f ''_n'-1. Then it follows:

<math> \begin{matrix}

f_j & = & \sum_{k=0}^{\frac{n}{2}-1} x_{2k} e^{-\frac{2\pi i}{n} j(2k)}

 +  \sum_{k=0}^{\frac{n}{2}-1} x_{2k+1} e^{-\frac{2\pi i}{n} j(2k+1)} \\ \\

& = & \sum_{k=0}^{n'-1} x'_{k} e^{-\frac{2\pi i}{n'} jk}

 +  e^{-\frac{2\pi i}{n}j} \sum_{k=0}^{n'-1} x_k e^{-\frac{2\pi i}{n'} jk} \\ \\

& = & \left\{ \begin{matrix}

 f'_j +  e^{-\frac{2\pi i}{n}j} f_j & \mbox{if } j<n' \\ \\

 f'_{j-n'} -  e^{-\frac{2\pi i}{n}(j-n')} f_{j-n'} & \mbox{if } j \geq n' \end{matrix} \right.

\end{matrix} </math>

A form of this trick, including its recursive application, was already known around 1805 to Carl Friedrich Gauss, who used it to interpolate the trajectories of the asteroids Pallas and Juno, but was not widely recognized (being published only posthumously and in Latin); Gauss did not analyze the asymptotic complexity, however. Various limited forms were also rediscovered several times throughout the 19th and early 20th centuries. FFTs became popular after J. W. Cooley of IBM and J. W. Tukey of Princeton published a paper in 1965 reinventing the algorithm and describing how to perform it conveniently on a computer (including how to arrange for the output to be produced in the natural ordering).

This process is an example of the general technique of divide and conquer algorithms; in many traditional implementations, however, the explicit recursion is avoided, and instead one traverses the computational tree in breadth-first fashion.

The above re-expression of a size-n DFT as two size-n/2 DFTs is sometimes called the Danielson-Lanczos lemma, since the identity was noted by those two authors in 1942 (influenced by Runge's 1903 work). They applied their lemma in a "backwards" recursive fashion, repeatedly doubling the DFT size until the transform spectrum converged (although they apparently didn't realize the logarithmic asymptotic complexity they had achieved). The Danielson-Lanczos work predated widespread availability of computing machines and required hand calculation; they reported a computation time of 140 minutes for a size-64 DFT operating on real inputs (see below) to 3-5 significant digits. Cooley and Tukey's 1965 paper reported a running time of 0.02 minutes for a size-2048 complex DFT on an IBM 7094 (probably in 36-bit single precision, ~8 digits). Rescaling the time by n log n, this corresponds roughly to a speedup factor of around 800,000. The more modern FFT library FFTW, on a 2GHz Pentium-IV in 64-bit double precision (~16 digits), can compute a size-64 real-input DFT in 1μs and a size-2048 complex DFT in 100μs, speedups of about 8,000,000,000 and 10,000 over Danielson & Lanczos and Cooley & Tukey, respectively, not even including the considerable improvements in accuracy.

(140 minutes for size 64 may sound like a long time, but it corresponds to an average of at most 16 seconds per floating-point operation, around 20% of which are multiplications...this is a fairly impressive rate for a human being to sustain for almost two and a half hours, especially when you consider the bookkeeping overhead.)

Variations on the Cooley-Tukey algorithm

More generally, Cooley-Tukey algorithms recursively re-express a DFT of a composite size n = n₁n₂ as n₁ DFTs of size n₂, followed by multiplication by complex roots of unity called twiddle factors, followed by n₂ DFTs of size n₁. Typically, either n₁ or n₂ is a small factor, called the radix. If n₁ is the radix, it is called a decimation in time (DIT) algorithm, whereas if n₂ is the radix, it is decimation in frequency (DIF, also called the Sande-Tukey algorithm). The version presented above is a radix-2 DIT algorithm; in the final expression, the phase multiplying the odd transform is the twiddle factor, and the +/- combination (butterfly) of the even and odd transforms is a size-2 DFT. (The radix's small DFT is sometimes known as a butterfly, so-called because of the shape of the data-flow diagram for the radix-2 case.) Gauss used a radix-3 DIT (or radix-4 DIF) step in a 12-point DFT example.

There are many other variations on the Cooley-Tukey algorithm. Mixed-radix implementations handle composite sizes with a variety of (typically small) factors in addition to two, usually (but not always) employing the O(n²) algorithm for the prime base cases of the recursion. Split-radix merges radices 2 and 4, exploiting the fact that the first transform of radix 2 requires no twiddle factor, in order to achieve a minimal operation count for power-of-two sizes. (On present-day computers, performance is determined more by cache and CPU pipeline considerations than by strict operation counts; well optimized FFT implementations often employ larger radices and/or hard-coded base-case transforms of significant size.) Another way of looking at the Cooley-Tukey algorithm is that it re-expresses a size n one-dimensional DFT as an n₁ by n₂ two-dimensional DFT (plus twiddles), where the output matrix is transposed. The net result of all of these transpositions, for a radix-2 algorithm, corresponds to a bit reversal of the input (DIF) or output (DIT) indices. If, instead of using a small radix, one employs a radix of roughly √n and explicit input/output matrix transpositions, it is called a four-step (also six-step) algorithm, initially proposed for cache/locality optimization (Gentleman and Sande, 1966).

The general Cooley-Tukey factorization rewrites the indices j and k as j = n₂ j₁ + j₂ and k = n₁ k₂ + k₁, respectively, where the indices j_a and k_a run from 0..n_a-1 (for a of 1 or 2). That is, it re-indexes the input (k) and output (j) as n₁ by n₂ two-dimensional arrays in column-major and row-major order, respectively. When this reindexing is substituted into the DFT formula for jk, the n₂j₁n₁k₂ cross term vanishes (its exponential is unity), and the remaining terms give

<math>f_{n_2 j_1 + j_2} =

    \sum_{k_1=0}^{n_1-1} 
      \left[ e^{-\frac{2\pi i}{n} j_2 k_1 } \right]
      \left( \sum_{k_2=0}^{n_2-1} x_{n_1 k_2 + k_1}  
              e^{-\frac{2\pi i}{n_2} j_2 k_2 } \right)
      e^{-\frac{2\pi i}{n_1} j_1 k_1 }

</math>

where the inner sum is a DFT of size n₂, the outer sum is a DFT of size n₁, and the [...] bracketed term is the twiddle factor.

The 1965 Cooley-Tukey paper noted that one can employ an arbitrary radix r (as well as mixed radices), but failed to realize that the radix butterfly is itself a DFT that can use FFT algorithms. Hence, they reckoned the complexity to be O(r² n/r log_rn), and erroneously concluded that the optimal radix was 3 (the closest integer to e). (Gauss also derived the algorithm for arbitrary radices, and gave explicit examples of both radix-3 and radix-6 steps.)

Other FFT algorithms

There are other FFT algorithms distinct from Cooley-Tukey. For relatively prime n₁ and n₂, one can use the Prime-Factor (Good-Thomas) algorithm, based on the Chinese Remainder Theorem, to factorize the DFT similarly to Cooley-Tukey but without the twiddle factors. The Rader-Brenner algorithm is a Cooley-Tukey-like factorization but with purely imaginary twiddle factors, reducing multiplications at the cost of increased additions and reduced numerical stability. Algorithms that recursively factorize the DFT into smaller operations other than DFTs include the Bruun and QFT algorithms. (The Rader-Brenner and QFT algorithms were proposed for power-of-two sizes, but it is possible that they could be adapted to general composite n. Bruun's algorithm was adapted to the mixed-radix case for even n by H. Murakami.) Bruun's algorithm, in particular, is based on interpreting the FFT as a recursive factorization of the polynomial zⁿ - 1, here into real-coefficient polynomials of the form z^m - 1 and z^2m + az^m + 1. A similar polynomial viewpoint is also exploited by the Winograd algorithm, which factorizes zⁿ - 1 into cyclotomic polynomials—these often have coefficients of 1, 0, or -1, and therefore require few (if any) multiplications, so Winograd can be used to obtain minimal-multiplication FFTs and is often used to find efficient algorithms for small prime factors. In particular, Winograd also makes use of a general algorithm by Rader for FFTs of prime sizes. Rader's algorithm, exploiting the existence of a generator for the multiplicative group modulo prime n, expresses a DFT of prime size n as a cyclic convolution of (composite) size n - 1, which can then be computed by a pair of ordinary FFTs via the convolution theorem (although Winograd uses other convolution methods). Another prime-size FFT is due to L. I. Bluestein, and is sometimes called the chirp-z algorithm; it also re-expresses a DFT as a convolution, but this time of the same size (which can be zero-padded to a power of two and evaluated by radix-2 Cooley-Tukey FFTs, for example), via the identity jk = -(j-k)²/2 + j²/2 + k²/2.

FFT algorithms specialized for real and/or symmetric data

In many applications, the input data for the DFT are purely real, in which case the outputs satisfy the symmetry

and efficient FFT algorithms have been designed for this situation. One approach consists of taking the ordinary Cooley-Tukey algorithm and removing the redundant parts of the computation, saving roughly a factor of two in time and memory. Alternatively, it is possible to express an even-length real-input DFT as a complex DFT of half the length (whose real and imaginary parts are the even/odd elements of the original real data), followed by O(n) post-processing operations.

It was once believed that real-input DFTs could be more efficiently computed by means of the Discrete Hartley transform (DHT), but this was subsequently disproved: a specialized real-input DFT algorithm (FFT) can typically be found that requires fewer operations than the corresponding DHT algorithm (FHT) for the same number of inputs.

There are further FFT specializations for the cases of real data that have even/odd symmetry, in which case one can gain another factor of ~2 in time/space and the DFT becomes the discrete cosine/sine transform(s) (DCT/DST). Instead of directly modifying an FFT algorithm for these cases, DCTs/DSTs can also be computed via FFTs of real data combined with O(n) pre/post processing.

Accuracy and approximations

All of the FFT algorithms discussed so far compute the DFT exactly (in exact arithmetic, i.e. neglecting floating-point errors). A few "FFT" algorithms have been proposed, however, that compute the DFT approximately, with an error that can be made arbitrarily small at the expense of increased computations. Such algorithms trade the approximation error for increased speed or other properties. For example, an approximate FFT algorithm by Edelman et al. (1999) achieves lower communication requirements for parallel computing with the help of a fast-multipole method. A wavelet-based approximate FFT by Guo and Burrus (1996) takes sparse inputs/outputs (time/frequency localization) into account more efficiently than is possible with an exact FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995). Only the Edelman algorithm works equally well for sparse and non-sparse data, however, since it is based on the compressibility (rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data.

Even the "exact" FFT algorithms have errors when finite-precision floating-point arithmetic is used, but these errors are typically quite small; most FFT algorithms, e.g. Cooley-Tukey, have excellent numerical properties. The upper bound on the relative error for the Cooley-Tukey algorithm is O(ε log n), compared to O(ε n^3/2) for the naive DFT formula (Gentleman and Sande, 1966), where ε is the machine floating-point relative error. In fact, the average errors are much better than these upper bounds, being only O(&epsilon √log n) for Cooley-Tukey and O(ε √n) for the naive DFT (Schatzman, 1996). These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT (i.e. the trigonometric function values), and it is not unusual for incautious FFT implementations to have much worse accuracy, e.g. if they use unstable trigonometric recurrence formulas. Some FFTs other than Cooley-Tukey, such as the Rader-Brenner algorithm, are intrinsically less stable.

References

James W. Cooley and John W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput. 19, 297-301 (1965).
Carl Friedrich Gauss, "Nachlass: Theoria interpolationis methodo nova tractata," Werke band 3, 265-327 (Königliche Gesellschaft der Wissenschaften, Göttingen, 1866). See also M. T. Heideman, D. H. Johnson, and C. S. Burrus, "Gauss and the history of the fast Fourier transform," IEEE ASSP Magazine 1 (4), 14-21 (1984).
G. C. Danielson and C. Lanczos, "Some improvements in practical Fourier analysis and their application to X-ray scattering from liquids," J. Franklin Inst. 233, 365-380 and 435-452 (1942).
P. Duhamel and M. Vetterli, "Fast Fourier transforms: a tutorial review and a state of the art," Signal Processing 19, 259-299 (1990).
W. M. Gentleman and G. Sande, "Fast Fourier transforms—for fun and profit," Proc. AFIPS 29, 563-578 (1966).
H. Guo, G. A. Sitton, and C. S. Burrus, "The Quick Discrete Fourier Transform," Proc. IEEE Conf. Acoust. Speech and Sig. Processing (ICASSP) 3, 445-448 (1994).
G. Bruun, "z-transform DFT filters and FFTs," IEEE Trans. Acoust. Speech Sig. Processing ASSP-26, 56-63 (1978). Also H. Murakami, "Real-valued fast discrete Fourier transform and cyclic convolution algorithms of highly composite even length," Proc. ICASSP 3, 1311-1314 (1996).
C. M. Rader, "Discrete Fourier transforms when the number of data samples is prime," Proc. IEEE 56, 1107-1108 (1968).
L. I. Bluestein, Northeast Electronics Res. and Eng. Meeting Record 10, 218-219 (1968). Also D. H. Bailey and P. N. Swarztrauber, "The fractional Fourier transform and applications," SIAM Review 33, 389-404 (1991).
H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus, "Real-valued fast Fourier transform algorithms," IEEE Trans. Acoust. Speech Sig. Processing ASSP-35, 849-863 (1987).
A. Edelman, P. McCorquodale, and S. Toledo, "The Future Fast Fourier Transform?" SIAM J. Sci. Computing 20, 1094-1114 (1999).
H. Guo and C. S. Burrus, "Fast approximate Fourier transform via wavelets transform," Proc. SPIE Intl. Soc. Opt. Eng. 2825, 250-259 (1996).
O. V. Shentov, S. K. Mitra, U. Heute, and A. N. Hossen, "Subband DFT. I. Definition, interpretations and extensions," Signal Processing 41 (3), 261-277 (1995).
James C. Schatzman, "Accuracy of the discrete Fourier transform and the fast Fourier transform," SIAM J. Sci. Comput. 17 (5), 1150-1166 (1996).
Matteo Frigo and Steven G. Johnson: FFTW, http://www.fftw.org/. A free (GPL) C library for computing discrete Fourier transforms in one or more dimensions, of arbitrary size, using Fast Fourier Transforms. Also M. Frigo and S. G. Johnson, "FFTW: An adaptive software architecture for the FFT," Proc. ICASSP 3, 1381-1384 (1998).

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia

Search over one million articles, find something about almost anything!