The popular Cooley-Tukey algorithm also subdivides a DFT of size n into smaller transforms of size n1 and n2, but it has the disadvantage that it also requires extra multiplications by roots of unity called twiddle factors, in addition to the smaller transforms. On the other hand, the PFA has the disadvantages that it only works for relatively prime factors and that it requires a more complicated re-indexing of the data based on the Chinese Remainder Theorem (CRT).
The PFA algorithm is also closely related to the nested Winograd FFT algorithm, where the latter performs the decomposed n1 by n2 transform via more sophisticated two-dimensional convolution techniques. Some older papers therefore also call Winograd's algorithm a PFA FFT. (Outside of the FFT literature, a few people confusingly refer to the mixed-radix Cooley-Tukey algorithm as a "prime-factor" FFT.)
(Although the PFA is distinct from the Cooley-Tukey algorithm, it is interesting to note that Good's 1958 work on the PFA was cited as inspiration by Cooley and Tukey in their famous 1965 paper. In fact, it was the only prior FFT work cited by them, as they were not then aware of the earlier research by Gauss and others.)
Recall that the DFT is defined by the formula:
The PFA involves a re-indexing of the input and output arrays, which when substituted into the DFT formula transforms it into two nested DFTs (a two-dimensional DFT).
Suppose that n = n1n2, where n1 and n2 are relatively prime. In this case, we can define a one-to-one re-indexing of the input k and output j by:
where n1-1 is the multiplicative inverse of n1 modulo n2 and vice-versa for n2-1; the indices ja and ka run from 0,...,na-1 (for a = 1, 2). These inverses only exist for relatively prime n1 and n2, and that condition is also required for the mappings to be one-to-one.
This re-indexing of k is called the Ruritanian mapping, while this re-indexing of j is called the CRT mapping. The latter refers to the fact that j is the solution to the Chinese remainder problem j = j1 mod n1 and j = j2 mod n2.
(One could instead use the Ruritanian mapping for the output j and the CRT mapping for the input k, or various intermediate choices.)
A great deal of research has been devoted to schemes for evaluating this re-indexing efficiently, ideally in-place, while minimizing the number of costly modulo operations (Chan, 1991, and references).
The above re-indexing is then substituted into the formula for the DFT, and in particular into the product jk in the exponent. Because e2πi = 1, this exponent is evaluated modulo n: any n1n2 = n cross term in the jk product can be set to zero. (Similarly, fj and xk are implicitly periodic in n, so their subscripts are evaluated modulo n.) The remaining terms give:
\sum_{k_1=0}^{n_1-1} \left( \sum_{k_2=0}^{n_2-1} x_{k_1 n_2 + k_2 n_1} e^{-\frac{2\pi i}{n_2} j_2 k_2 } \right) e^{-\frac{2\pi i}{n_1} j_1 k_1 }.
</math>
The inner and outer sums are simply DFTs of size n2 and n1, respectively
(Here, we have used the fact that n1-1n1 vanishes when evaluated modulo n2 in the inner sum's exponent, and vice-versa for the outer sum's exponent.)
Search Encyclopedia
|