I recently posted a few articles about Fourier decomposition, and I often get asked–and ask myself–“How do I calculate that again?” using a tool like Matlab or (my preference) its open-source cousin, Scilab. Just how we change a data set sampled in time to one sampled in frequency is straightforward. We run the Fast Fourier Transform algorithm:
Easy. But what if you want to actually understand the data and do something with it? The problems with not knowing how to tailor the output of this calculation are best illustrated with some plots. In Scilab, let’s generate a basic audio signal with two tones sampled at 44.1kHz. First we create an axis consisting of 1024 evenly spaced points in time. A sample rate of 44.1kHz–the CD audio rate–means the sample times are separated by a bit less than 23 microseconds. (Note: this example code works with Scilab 5.4.0 on OSX, your mileage may vary.)
--> N = 1024; --> fs = 44100; --> t = (1/fs)*(0:1:(N-1));
Then we create two sine waves with frequencies of 5000Hz and 10000Hz and different amplitudes (1 and 2), add them together, and plot the result.
--> f1 = 10000; --> a1 = 1; --> y1 = a1*sin(2*%pi*f1*t); --> f2 = 5000; --> a2 = 2; --> y2 = a2*sin(2*%pi*f2*t); --> y = y1 + y2; --> plot(t, y);
After zooming in on the plot, we get something that looks like this:
Taking the Fourier transform and plotting:
--> Y = fft(y); --> plot(Y);
which looks interesting, but it’s not terribly useful. First, we don’t have a frequency axis, just numbered samples. Second, we expect a delta function–a spike (technical term)–for each sine wave, but we see four spikes instead of two plus some negative-going add-ons. And the amplitudes are all messed up; the spikes are too tall.
Second thing first. The result of the FFT is a series of complex numbers, and if you give that to the plot function, Scilab plots the real part by default. In most applications, we’re not interested in real and imaginary parts but rather amplitude and phase. Antennas and image sensors (eyes) and microphones (ears) and speakers and… often only care about intensity, or amplitude-squared; even the phase information is ignored. So for now, let’s plot the amplitude of the FFT using the complex absolute value function.
At the same time, let’s add a proper frequency-axis to the plot. The nature of the FFT is such that the 0th sample is DC or a frequency of 0, and the highest frequency is just less than the sampling frequency. What does “just less” mean? Well, the sample spacing is such that if you extended the sequence by one sample, the Nth point would be the sampling frequency (but because we’re zero referenced, the (N-1)th point is the last one.) An easy way to handle this is:
--> f = linspace(0, fs, N+1); // one sample too many --> f = f(1:$-1); // drop the last sample
The final trick is to divide by N to scale the amplitude correctly–I always forget that bit, so now I’ve written it down. Now if we plot the scaled amplitude of the FFT vs. our frequency axis
--> plot(f, abs(Y)/N);
Hmmm, still four spikes instead of two. And we started with amplitudes of 1 and 2, not 0.5 and 1, so the scaling is still wrong. Or is it?
There are two ways to look at this. First, from a Fourier transform table, the transform of a sine function of frequency f is a delta function at f and one at -f each with half the amplitude. So if we move the right half of our plot to the left and shift the axis to include negative frequencies:
--> plot(f-(fs/2), fftshift(abs(Y)/N));
the plot looks like this:
Now we have spikes of amplitude 1–half the original 2–at 5000Hz and -5000Hz, and spikes of amplitude 0.5 at 10000Hz and -10000Hz, just like we expect. (Note that the amplitudes are slightly less than these values, and the spikes flare at their bases. This is a topic for another time, but it’s related to windowing with a rectangle…or convolving with a sinc.)
The other way of looking at the result is to re-examine the unshifted plot and note that the high-frequency spikes are at 39100Hz and 34100Hz. Not coincidentally, these frequencies are the sampling frequency (44100Hz) minus the frequencies of interest (e.g. 44100-5000=39100). The FFT plot is a mirror image folded–origami, get it?–at one-half the sampling frequency. Half the sampling frequency is often called the Nyquist frequency (Harry Nyquist was Swedish, and with folding we’ve resolved this post’s title). The theory says that this is the highest frequency that can be reconstructed from the sampled signal without aliasing. This is also a topic for another time, but here’s a taste.
Let’s keep the 44100Hz sampling rate, but feed it a 30000Hz signal.
--> z = sin(2*%pi*30000*t); --> plot(f-(fs/2), fftshift(abs(fft(z))/N));
Again we see the characteristic spikes at positive and negative frequencies, but the frequency is not 30000Hz. In fact, it is 44100-30000=14100Hz. So if you record a 30kHz tone at a 44.1kHz rate and play it back, the signal will manifest itself as a tone with a bit less than half the original frequency!
This is fundamental theory that is applied regularly to countless systems: audio, video and imaging, communications, biometric sensors, … While the math and the tools are relatively simple, reality makes the engineering more difficult. Keep on eye on this blog for more.