ChipMusic.org - Sega SPEECH synthesis CSM Sounds

Re: Sega SPEECH synthesis CSM Sounds

2014-07-16T19:39:31Z

This thread was declared dead almost 1 year ago!

Re: Sega SPEECH synthesis CSM Sounds

2014-07-16T15:01:09Z

Awesome!

Re: Sega SPEECH synthesis CSM Sounds

2013-08-16T11:52:21Z

akira^8GB wrote:

Just noticed you posted the ROM. Thanks! I'll give it a play this weekend and hopefully sample some fear-inducing sounds

Yep ^_^ let me hear those samples when done

Re: Sega SPEECH synthesis CSM Sounds

2013-08-13T09:22:42Z

Just noticed you posted the ROM. Thanks! I'll give it a play this weekend and hopefully sample some fear-inducing sounds

Re: Sega SPEECH synthesis CSM Sounds

2013-06-01T00:34:38Z

So here is some example of how to use the very unique CSM speech.
First a reminder :
Example of CSM speech in the MSX game Zeilard :
http://www.alyjameslab.com/tutorials/MSX_CSM.wav
Example of CSM random speech in FMDrive vsti
http://www.alyjameslab.com/tutorials/FMDrive_CSM.wav

Sine waves artifacts are noticeable in both of these examples because there is no decay on the envelope generator.

Now lets decay very quickly and the artifact are less noticeable at the end.
Example of an attempt at saying " HELLO" with fast decay
first MIDI data are played very quickly then slowed down.
http://www.alyjameslab.com/tutorials/hellocsm.wav

and for a visual explanation better than words..
Here is one instance of FMDrive in Cubase with automation lane and MIDI channels for the special mode..
Notice the similarity between the spectrogram and the automation data.

what you see is power of spectrum and variation of frequency in the time domain.
Sum of sine waves at different frequencies gives us what we call vocal formants.

Re: Sega SPEECH synthesis CSM Sounds

2013-05-26T18:25:31Z

Keep in mind that is it is one of my test ROM not intended for public release and not specially user friendly

CSM MODE ROM:
-------------------------------------------------------------------------------------
Test mode for FMDrive Vsti dev.
Works on a real MD1 and Regen Emulator.
Use at your own risk
-------------------------------------------------------------------------------------
USE 2 OPERATORS ON CH3: OP2>OP4
The ROM starts in NORMAL mode until BUTTON C is pressed
(then it will be either in CSM or SPECIAL until ROM reset)
A key on to CH3 is set on startup and basic registers set.

COMMANDS: on PAD 1
(there is also a command on PAD2 that controls the TL of OP2..
cannot remember witch one ^^)
-------------------------------------------------------------------------------------
START : KEY on/off (OP2 + OP4)

A: Pressed Set AR of OP2 to 1F, depressed Set AR to 00
So if you want to have OP2 modulating OP4 keep it pressed

B: Pressed Key on OP2 and Key off OP4

C: Pressed CSM mode (auto key on/off at Timer A speed)
Depressed Special Mode (independent FRQ set by RIGHT)

LEFT : ALGO change from 0 to 7 then wrap.

RIGHT: FRQ change for OP2 (change block. down then wrap)

DOWN :FRQ change for OP4 (change block. down then wrap)

UP: Timer A period (down then wrap)

DOWNLOAD: http://www.alyjameslab.com/tutorials/FM … test03.bin

Re: Sega SPEECH synthesis CSM Sounds

2013-05-26T16:45:53Z

boomlinde wrote:

I did some encoding experiments that would fit this audio mode perfectly -- condensing a sample to its N most prominent sine components over a window of M microseconds. Then you just have to encode the partial number and its amplitude. For 4 components at 25 Hz, you'd probably be able to get it down to 800 bps. Here's an example of 8 partials at 25 Hz: https://dl.dropboxusercontent.com/u/501 … ariots.mp3. This has no amplitude quantization, though.
For speech synthesis, you could severely limit the spectrum with a pre-filter, but I'm not sure what would produce the best overall result for speech.

Great work !
a frequency range of 100Hz-5000Hz should be enough for speech analysis.
I would need something like that to make a nice tool for FMDrive
// Wav vocal sample >> FFT analysis >> Formant Freqs + amplitude >> MIDI
4 main formants in the speech>> FRQ to operators Fnumber >> midi pitch or midi notes
4 levels for the power of spectrum >> Db to operators TL >> midi volume or TL(cc)

These should output 4 midi files for OP4,3,2 and 1

can you make something like that ? that would be also cool as a wav sample to midi using each sine waves signals for each most prominent sine...

Re: Sega SPEECH synthesis CSM Sounds

2013-05-26T16:36:08Z

akira^8GB wrote:

Are you going to release the ROM?

thx 4 reminding me, I have forgotten to post it..the ROM should b posted here today

Re: Sega SPEECH synthesis CSM Sounds

2013-05-26T11:39:32Z

I did some encoding experiments that would fit this audio mode perfectly -- condensing a sample to its N most prominent sine components over a window of M microseconds. Then you just have to encode the partial number and its amplitude. For 4 components at 25 Hz, you'd probably be able to get it down to 800 bps. Here's an example of 8 partials at 25 Hz: https://dl.dropboxusercontent.com/u/501 … ariots.mp3. This has no amplitude quantization, though.

For speech synthesis, you could severely limit the spectrum with a pre-filter, but I'm not sure what would produce the best overall result for speech.

Re: Sega SPEECH synthesis CSM Sounds

2013-05-26T09:36:39Z

Are you going to release the ROM?

Re: Sega SPEECH synthesis CSM Sounds

2013-05-09T16:54:29Z

Aly James wrote:

and Here:

God damn it, that is TERRIFYING. I love it
Looking forward to play around with that ROM!

Re: Sega SPEECH synthesis CSM Sounds

2013-05-09T16:21:42Z

i have updated the previous post with an another example.

Re: Sega SPEECH synthesis CSM Sounds

2013-05-09T15:30:31Z

indeed

Re: Sega SPEECH synthesis CSM Sounds

2013-05-09T15:28:20Z

That is super awesome!

Re: Sega SPEECH synthesis CSM Sounds

2013-05-09T15:03:07Z

Lazerbeat wrote:

Ok, Im really sorry but can someone explain what the fuck is going on here? Does the YM2612 have a speech mode in it that nobody used? Is it a kind of primative formant synthesis or something?

Absolutely right
It is not really easy to program to actually produce understandable speech but the technology is definitely here in the YM2612...
I have made a few video on the FMDRIVE VSti to showcase what you can do with it.
I have found some rare use of CSM inside some Game Arts games for MSX:
The Silpheed game on PC88 MSX computer featuring a very similar chip than YM2612 wih the exact same CSM feature.
In use here to produce the robotic speech:

and Here:

The FM sound of YAMAHA has the ability to key-on / key off immediately (some channels) or all channels when the timer A built-in overflows.
It is called "CSM speech synthesis mode" and stands for Composite Sinusoidal Modeling.
A type of speech coding, CSM speech synthesis is a technique to reproduce with the combination of multiple sine wave, the original data of a vocal sample.

There is a theory using FFT to "de compose" the frequency content of a signal into a sum a different sine waves, in the time domain with different pitches and volume.
Based on this theory, If you play at the same time more than one sine at an appropriate TL volume and frequencies , you can reproduce the waveform similar to the original waveform.
YM2612 can output 4 sine with 4 different Frequency and 4 Different TL volume.
FMDRIVE Vst uses that with MIDI CH1 , 11, 12, 13 to control Frequency and Volume an additional CH 14 to control the timer A.
You can also midi learn these controls to any midi controller and you're good for some live talking shit

This mode is also useful to output new type of sounds similar to having a powerful filter on board...and that is what is very interesting in addition to the speech thing.
My testing have shown some really cool stuff