There's actually something subtle going on here which isn't quite heterodyning in the same sense as a Theremin (though it's similar).
Within the bandwidth your sound system is capable of producing (let's say, about 24 kHz), audio reproduction is ideally linear in the time domain. This means that if you play two or more pure tones simultaneously on your computer, the audio output should contain only those two frequencies without any harmonics. Of course your sound system isn't perfect and there are nonlinearities, but they are exceedingly small and mostly imperceptible for most reasonable equipment.
Visually you can see what linear reproduction means by mixing together two tones in some audio software and observing the results. Here is an example (http://i26.tinypic.com/34eqjxw.jpg). The top track is the linear combination of a 3 kHz and 4 kHz tone, and the bottom tracks are the two respective pure tones. This is a Fourier transform (http://i29.tinypic.com/eip6wh.jpg) of the top track, which shows the frequencies contained within. Notice that there are peaks at 3 and 4 kHz, but no peaks at the sum or difference frequencies. This is linear mixing and is how the audio system in your computer is supposed to work. There does exist a volume envelope at 1 kHz, which is typically called a beat frequency and could be perceived as tremolo, but this isn't truly a 1 kHz tone.
However, it is true that you can sometimes hear a difference frequency of two pure tones, even if the output of your speaker doesn't contain that frequency. What's actually at play (assuming your speakers aren't to blame*) is the non-linear response of your ear and brain, that is, your physical and neurological interpretation of sound. While there is not actually a difference frequency produced by linear mixing, for certain frequency combinations you can still hear sum and difference tones due to the way your hearing works. I've usually heard this phenomenon referred to as "combination tones", but I think I've at least once seen it called heterodyning (in general, heterodyning is any nonlinear mixing of waves, whether they be mechanical, electromagnetic, or otherwise). These kinds of psychoacoustics are heavily studied and often applied in lossy compression algorithms like MP3 in order to eliminate data which may be redundant given the way we perceive sound.
The heterodyning that occurs in a typical Theremin setup happens in an electrical nonlinear device, for example a diode or cut-off transistor or vacuum tube. These devices inherently respond non-linearly to input signals and can approximate multiplication. Therefore, at the output of the mixer circuitry in a Theremin, there exist true sum and difference tones of the RF oscillator frequencies. If this were not the case and the mixer acted linearly, the filters which remove the inaudible RF signal and would produce no audible output!
So, the effect you're experiencing with these test tones is heterodyning of a sort, but it's your ear and brain which are producing the difference notes you perceive, not your audio system! *
* Disclaimer: Non-idealities in your speakers, D/A converters, and amplifiers can indeed produce some nonlinear effects in your sound. However, these effects are usually avoided as much as possible in audio system design except where specifically desired (like in an overdriven guitar amp).