Let x_{0}, ...., x_{n1} be complex numbers. The DFT is defined by the formula
Evaluating these sums directly would take O(n^{2}) arithmetical operations (see Big O notation). An FFT is an algorithm to compute the same result in only O(n log n) operations.
Since the inverse DFT is the same as the DFT, but with the sign of the exponent flipped and a 1/n factor, any FFT algorithm can easily be adapted for it as well.

The most common FFT is the CooleyTukey algorithm. In its most basic form, this method first computes the Fourier transforms of the evenindexed numbers x_{0}, x_{2},...,x_{n2}, and of the oddindexed numbers x_{1}, x_{3},....,x_{n1}, and then combines those two results to produce the Fourier transform of the whole sequence. This idea can then be performed recursively to reduce the overall runtime to O(n log n). This simplified form assumes that n is a power of two; since the number of sample points n can usually be chosen freely by the application, this is often not an important restriction.
We write n' = n/2 and denote the discrete Fourier transform of the even indexed numbers x'_{0} = x_{0}, x'_{1} = x_{2},....,x'_{n'1} = x_{n2} by f_{0}',...,f '_{n'1}, and the transform of the odd indexed numbers x''_{0} = x_{1},x''_{1} = x_{3},...,x''_{n'1} = x_{n1} by f_{0}'', ..., f ''_{n'1}. Then it follows:
f_j & = & \sum_{k=0}^{\frac{n}{2}1} x_{2k} e^{\frac{2\pi i}{n} j(2k)}
+ \sum_{k=0}^{\frac{n}{2}1} x_{2k+1} e^{\frac{2\pi i}{n} j(2k+1)} \\ \\
& = & \sum_{k=0}^{n'1} x'_{k} e^{\frac{2\pi i}{n'} jk}
+ e^{\frac{2\pi i}{n}j} \sum_{k=0}^{n'1} x_k e^{\frac{2\pi i}{n'} jk} \\ \\
& = & \left\{ \begin{matrix}
f'_j + e^{\frac{2\pi i}{n}j} f_j & \mbox{if } j<n' \\ \\
f'_{jn'}  e^{\frac{2\pi i}{n}(jn')} f_{jn'} & \mbox{if } j \geq n' \end{matrix} \right.
\end{matrix} </math>
A form of this trick, including its recursive application, was already known around 1805 to Carl Friedrich Gauss, who used it to interpolate the trajectories of the asteroids Pallas and Juno, but was not widely recognized (being published only posthumously and in Latin); Gauss did not analyze the asymptotic complexity, however. Various limited forms were also rediscovered several times throughout the 19th and early 20th centuries. FFTs became popular after J. W. Cooley of IBM and J. W. Tukey of Princeton published a paper in 1965 reinventing the algorithm and describing how to perform it conveniently on a computer (including how to arrange for the output to be produced in the natural ordering).
This process is an example of the general technique of divide and conquer algorithms; in many traditional implementations, however, the explicit recursion is avoided, and instead one traverses the computational tree in breadthfirst fashion.
The above reexpression of a sizen DFT as two sizen/2 DFTs is sometimes called the DanielsonLanczos lemma, since the identity was noted by those two authors in 1942 (influenced by Runge's 1903 work). They applied their lemma in a "backwards" recursive fashion, repeatedly doubling the DFT size until the transform spectrum converged (although they apparently didn't realize the logarithmic asymptotic complexity they had achieved). The DanielsonLanczos work predated widespread availability of computing machines and required hand calculation; they reported a computation time of 140 minutes for a size64 DFT operating on real inputs (see below) to 35 significant digits. Cooley and Tukey's 1965 paper reported a running time of 0.02 minutes for a size2048 complex DFT on an IBM 7094 (probably in 36bit single precision, ~8 digits). Rescaling the time by n log n, this corresponds roughly to a speedup factor of around 800,000. The more modern FFT library FFTW, on a 2GHz PentiumIV in 64bit double precision (~16 digits), can compute a size64 realinput DFT in 1μs and a size2048 complex DFT in 100μs, speedups of about 8,000,000,000 and 10,000 over Danielson & Lanczos and Cooley & Tukey, respectively, not even including the considerable improvements in accuracy.
(140 minutes for size 64 may sound like a long time, but it corresponds to an average of at most 16 seconds per floatingpoint operation, around 20% of which are multiplications...this is a fairly impressive rate for a human being to sustain for almost two and a half hours, especially when you consider the bookkeeping overhead.)
More generally, CooleyTukey algorithms recursively reexpress a DFT of a composite size n = n_{1}n_{2} as n_{1} DFTs of size n_{2}, followed by multiplication by complex roots of unity called twiddle factors, followed by n_{2} DFTs of size n_{1}. Typically, either n_{1} or n_{2} is a small factor, called the radix. If n_{1} is the radix, it is called a decimation in time (DIT) algorithm, whereas if n_{2} is the radix, it is decimation in frequency (DIF, also called the SandeTukey algorithm). The version presented above is a radix2 DIT algorithm; in the final expression, the phase multiplying the odd transform is the twiddle factor, and the +/ combination (butterfly) of the even and odd transforms is a size2 DFT. (The radix's small DFT is sometimes known as a butterfly, socalled because of the shape of the dataflow diagram for the radix2 case.) Gauss used a radix3 DIT (or radix4 DIF) step in a 12point DFT example.
There are many other variations on the CooleyTukey algorithm. Mixedradix implementations handle composite sizes with a variety of (typically small) factors in addition to two, usually (but not always) employing the O(n^{2}) algorithm for the prime base cases of the recursion. Splitradix merges radices 2 and 4, exploiting the fact that the first transform of radix 2 requires no twiddle factor, in order to achieve a minimal operation count for poweroftwo sizes. (On presentday computers, performance is determined more by cache and CPU pipeline considerations than by strict operation counts; well optimized FFT implementations often employ larger radices and/or hardcoded basecase transforms of significant size.) Another way of looking at the CooleyTukey algorithm is that it reexpresses a size n onedimensional DFT as an n_{1} by n_{2} twodimensional DFT (plus twiddles), where the output matrix is transposed. The net result of all of these transpositions, for a radix2 algorithm, corresponds to a bit reversal of the input (DIF) or output (DIT) indices. If, instead of using a small radix, one employs a radix of roughly √n and explicit input/output matrix transpositions, it is called a fourstep (also sixstep) algorithm, initially proposed for cache/locality optimization (Gentleman and Sande, 1966).
The general CooleyTukey factorization rewrites the indices j and k as j = n_{2} j_{1} + j_{2} and k = n_{1} k_{2} + k_{1}, respectively, where the indices j_{a} and k_{a} run from 0..n_{a}1 (for a of 1 or 2). That is, it reindexes the input (k) and output (j) as n_{1} by n_{2} twodimensional arrays in columnmajor and rowmajor order, respectively. When this reindexing is substituted into the DFT formula for jk, the n_{2}j_{1}n_{1}k_{2} cross term vanishes (its exponential is unity), and the remaining terms give
\sum_{k_1=0}^{n_11} \left[ e^{\frac{2\pi i}{n} j_2 k_1 } \right] \left( \sum_{k_2=0}^{n_21} x_{n_1 k_2 + k_1} e^{\frac{2\pi i}{n_2} j_2 k_2 } \right) e^{\frac{2\pi i}{n_1} j_1 k_1 }
</math>
where the inner sum is a DFT of size n_{2}, the outer sum is a DFT of size n_{1}, and the [...] bracketed term is the twiddle factor.
The 1965 CooleyTukey paper noted that one can employ an arbitrary radix r (as well as mixed radices), but failed to realize that the radix butterfly is itself a DFT that can use FFT algorithms. Hence, they reckoned the complexity to be O(r^{2} n/r log_{r}n), and erroneously concluded that the optimal radix was 3 (the closest integer to e). (Gauss also derived the algorithm for arbitrary radices, and gave explicit examples of both radix3 and radix6 steps.)
There are other FFT algorithms distinct from CooleyTukey. For relatively prime n_{1} and n_{2}, one can use the PrimeFactor (GoodThomas) algorithm, based on the Chinese Remainder Theorem, to factorize the DFT similarly to CooleyTukey but without the twiddle factors. The RaderBrenner algorithm is a CooleyTukeylike factorization but with purely imaginary twiddle factors, reducing multiplications at the cost of increased additions and reduced numerical stability. Algorithms that recursively factorize the DFT into smaller operations other than DFTs include the Bruun and QFT algorithms. (The RaderBrenner and QFT algorithms were proposed for poweroftwo sizes, but it is possible that they could be adapted to general composite n. Bruun's algorithm was adapted to the mixedradix case for even n by H. Murakami.) Bruun's algorithm, in particular, is based on interpreting the FFT as a recursive factorization of the polynomial z^{n}  1, here into realcoefficient polynomials of the form z^{m}  1 and z^{2m} + az^{m} + 1. A similar polynomial viewpoint is also exploited by the Winograd algorithm, which factorizes z^{n}  1 into cyclotomic polynomials—these often have coefficients of 1, 0, or 1, and therefore require few (if any) multiplications, so Winograd can be used to obtain minimalmultiplication FFTs and is often used to find efficient algorithms for small prime factors. In particular, Winograd also makes use of a general algorithm by Rader for FFTs of prime sizes. Rader's algorithm, exploiting the existence of a generator for the multiplicative group modulo prime n, expresses a DFT of prime size n as a cyclic convolution of (composite) size n  1, which can then be computed by a pair of ordinary FFTs via the convolution theorem (although Winograd uses other convolution methods). Another primesize FFT is due to L. I. Bluestein, and is sometimes called the chirpz algorithm; it also reexpresses a DFT as a convolution, but this time of the same size (which can be zeropadded to a power of two and evaluated by radix2 CooleyTukey FFTs, for example), via the identity jk = (jk)^{2}/2 + j^{2}/2 + k^{2}/2.
FFT algorithms specialized for real and/or symmetric data
In many applications, the input data for the DFT are purely real, in which case the outputs satisfy the symmetry
It was once believed that realinput DFTs could be more efficiently computed by means of the Discrete Hartley transform (DHT), but this was subsequently disproved: a specialized realinput DFT algorithm (FFT) can typically be found that requires fewer operations than the corresponding DHT algorithm (FHT) for the same number of inputs.
There are further FFT specializations for the cases of real data that have even/odd symmetry, in which case one can gain another factor of ~2 in time/space and the DFT becomes the discrete cosine/sine transform(s) (DCT/DST). Instead of directly modifying an FFT algorithm for these cases, DCTs/DSTs can also be computed via FFTs of real data combined with O(n) pre/post processing.
All of the FFT algorithms discussed so far compute the DFT exactly (in exact arithmetic, i.e. neglecting floatingpoint errors). A few "FFT" algorithms have been proposed, however, that compute the DFT approximately, with an error that can be made arbitrarily small at the expense of increased computations. Such algorithms trade the approximation error for increased speed or other properties. For example, an approximate FFT algorithm by Edelman et al. (1999) achieves lower communication requirements for parallel computing with the help of a fastmultipole method. A waveletbased approximate FFT by Guo and Burrus (1996) takes sparse inputs/outputs (time/frequency localization) into account more efficiently than is possible with an exact FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995). Only the Edelman algorithm works equally well for sparse and nonsparse data, however, since it is based on the compressibility (rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data.
Even the "exact" FFT algorithms have errors when finiteprecision floatingpoint arithmetic is used, but these errors are typically quite small; most FFT algorithms, e.g. CooleyTukey, have excellent numerical properties. The upper bound on the relative error for the CooleyTukey algorithm is O(ε log n), compared to O(ε n^{3/2}) for the naive DFT formula (Gentleman and Sande, 1966), where ε is the machine floatingpoint relative error. In fact, the average errors are much better than these upper bounds, being only O(&epsilon √log n) for CooleyTukey and O(ε √n) for the naive DFT (Schatzman, 1996). These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT (i.e. the trigonometric function values), and it is not unusual for incautious FFT implementations to have much worse accuracy, e.g. if they use unstable trigonometric recurrence formulas. Some FFTs other than CooleyTukey, such as the RaderBrenner algorithm, are intrinsically less stable.
Search Encyclopedia

Featured Article
