A good friend of mine released this at Syntax Demoparty last year. http://www.youtube.com/watch?v=0dRCPmoNd9s
Think of it as "Vectorised voice" system which operates on two of the sid channels and defies what a C64 can do. The irony is that it takes a multi processor PC ages to actually generate the data before it hits the C64! (good old brute force) I'm talking to ALIH about actually doing a recording in a studio with proper mics and compressors and see how good we can get it to sound:
In his notes:
( see: http://noname.c64.org/csdb/release/?id=
show=notes )
"The music is a cover of Hoobastank's "My Turn" from the album For(n)ever. ... Speech sounds on a c64 - one channel is the carrier, which is the frequency of the speaking voice, and one is the modulator, which is a ring modulation, sync, or both. If you're Agemixer, filter banks are also involved . In any case, the codec used here works kind of like that. I use a phase vocoder (that I suspect has an occasional bug!) to get the base pitch of the tone, and then try different modulation frequences until I find the closest match (based on the magnatude portion of an FFT calculation). If people are interested, I will release this tool at some point.
I have a parameter in that which controls how often the sid is updated. I did 25fps to make everything fit into ram. .... So in order to get intelligable output, I ended up dealing with five parameters - two voice frequencies of eight bits each, two volumes of four bits each, a waveform of three bits ($11,$21,$41,$81) for the carrier channel and a modulation type of two bits (1 bit ring, 1 bit sync) for the modulator channel. This gets compressed into four bytes per updated frame. The instrumental part of the song was done in a single track in ninjatracker 1.02 (I think 1.02), then patched the player to only play a sid channel, mostly because the rotozoom part relies on a music player of no more than 12 rasterlines to achieve 50fps. Ninja takes 4 rasterlines, and voice player takes 7. There really is no excuse for it taking 7 rasterlines other than some shitty, shitty code on my part.
The replayer code has a couple of major points of shittiness, the most obvious one being the classic problem for anyone who does sid programming - you'll notice that I said that I needed to set two volumes.... so obviously you can't fucking do that . Quick sid theory lesson for those not initiated: there is a "feature" in the sid where you can change teh sustain _down_, but not _up_ without a retrigger. You can do this after having the ADSR at zero for three cycles or so (might have been five, i forget now), which is pretty inaudable. But, the fastest attack is 12ms or so, which is half a frame, whcih causes a nice audable click. I originally had the filters turned on to attempt to deal with that, but I actually _forgot_ to turn them back on after turning them off for testing reasons. But net result is that you end up with something that is vaugely in the right direction of there, but obviously not acutally totally brilliant, due to a combination of bugs and stupidity on my part. But I still thought it sounded cool, and some of the other people I showed it to thought that it sounded cool, and besides, I was already committed . I think it's one of those like/hate things again.
But back on topic, once I've turned the sample data into sid data, i cut it up into blocks and stream it from the disk into a 4k ring buffer from the loader routine. So you end up with code that looks like:
loadPart:
while ringBuffer needs data:
loadMusic
loadPart
while ringBuffer needs data:
loadMusic
in an effort to try and keep that buffer nice and full. And it mostly works... i mean, for obviously small values of "works". But it also puts some stress on the loading times, which is not something I considered until I wrote the ringbuffer code at 5am on satuday morning, and realised that all of my loading times doubled."
So there we go - something of a break though in speech synth on C64 ;-) From here in Australia!