Offline
CA
Jazzmarazz wrote:

1am...time for the demoralized to go to bed....
Maybe someone can shed light on this in that meantime since you shared all of your various bits of code. The Teensy code is so "dumb" (having only a few dozen lines) it can't possibly be wrong!

good luck

It's still 10:30 on the west coast smile

I actually tend to think that this idea was doomed from the very beginning. Let's theorize a bit. In order to read the data correctly we need to wait for each clock signal to come and then process it. Clock comes 4M times per second. Teensy over-clocked frequency is 96MHz. 96M/4M = 24.  At the very best we have only 24 cycles to handle our clock data. Is it enough with all other things intact (other interrupts etc)? I'd say no it is not.

Last edited by friendofmegaman (Apr 25, 2014 5:36 am)

Offline
Michigan

I would say that it is enough, but we are missing something obvious. What if, 96MHz is in fact 'too' fast. Could you possibly be reading data that has yet to be populated on D0 and D1? There is always a propagation delay, but it still doesn't account for mostly 4's...My excuse is that Im tired. Don't blame me for my ramblings. tongue

Offline
CA
Jazzmarazz wrote:

I would say that it is enough, but we are missing something obvious. What if, 96MHz is in fact 'too' fast. Could you possibly be reading data that has yet to be populated on D0 and D1? There is always a propagation delay, but it still doesn't account for mostly 4's...My excuse is that Im tired. Don't blame me for my ramblings. tongue

Nope, that's not possible. You see the data signals come when one clock pulse is still high. Which means if we're reading data on CLOCK's falling edge (an this is exactly what we're doing) the bits are already there.  Clock and data signals are kind of overlapping if you plot them. Thus it's guaranteed that data is read.

24 cycles in not much really. For example a 1 line of C code can compile to 10 commands and each command may be several bytes long.

I checked with Craig from Flashing LEDs and he confirmed this theory and suggested to use external SPI chips. I need to dig in this direction. But so far to do this really neatly probably an FPGA is needed. I love the idea of FPGA but the opportunity of learning absolutely counter intuitive Verilog or VHDL hurts me before I even started.

Offline
Melbourne, Australia

You could take a chip like this:

http://www.digikey.com.au/product-detai … ND/2354748

which will output analog RGB video if you can get it to swallow the GameBoy's sync / clock / data signals?

Datasheet here: http://read.pudn.com/downloads122/ebook … 152007.pdf

Offline

Just what I need ,but way to complex for me :-/

Offline
friendofmegaman wrote:
rvan wrote:

With my simple testing code (as follows) I think it was the ISR being called which took too many clock cycles (I recall reading 10 cycles of overhead somewhere).  Getting away from the Arduino code will hopefully reduce this.

Let us know the results, because in theory I don't see how this is faster. The Arduino sketch is merely a wrapper for what you've written. It puts contents of setup() to main() and the contents of loop() inside the  for(;;) loop in main. Of course I might be wrong smile

It's not the wrapper part of the Arduino environment which is the concern, but rather the Arduino/TeensyDuino library functions, which are known to be inefficient in some respects.  I suspect that using attachInterrupt may add additional complexity to the way in which interrupts are actually handled, which consumes unnecessary clock cycles.  This is just a guess, though.

friendofmegaman wrote:

There's one more concern though. What if a signal arrives while Teensy is handling the previous interrupt? This may me the reason why rvan's attempt didn't succeed. But I'd really like to see the code in order to avoid wasting time on re-doing something that has been done already.

Based on the simple sample program I wrote (see above), I think the issue is simply that the ISR is not being called every time it should, as the interrupting signal (Clock) is too fast; i.e., the ISR cannot be called and return within ~24 clock cycles.

friendofmegaman wrote:

Then you need to check the pseudonym for your serial devise where Teensy is. For *nix systems it should be something like /dev/tty.usbXXXX where XXXX is some string. I've no clue how it's addressed in Windows though.

Incedentally, mine is /dev/ttyACM0, not /dev/tty.usbXXXX, in case anyone else is looking for the right device.


friendofmegaman wrote:

Another question is has anybody benchmarked the USB serial speed of Teensy 3.1? According to this page: http://www.pjrc.com/teensy/usb_serial.html it is not very big and we need to keep data to transmit as small as possible as I'm trying to do by packing it into an unsigned char array.

The page you linked seems to be for the older Teensy boards, not Teensy 3.1.  The older boards are a different architecture, so this information isn't relevant to us.  While it may be worth benchmarking serial at some point, I do not suspect it will be a major issue, and think we should work on getting data into the Teensy, before we go into the particulars of how we get it out of the Teensy.

Jazzmarazz wrote:
rvan wrote:

I had a think about this, and I think the only foolproof way is to store only three pixels per byte and use the remaining two bits to store new line and/or new frame markers.  Another option would be to simply count pixels (assuming we always begin sending at the start of a frame) and hope we don't drop any bytes, but that is not as robust.

I like that, 3 pixels per byte and a 2-bit marker. Serial is sure fast enough to handle that.

I imagine the data scheme looking like this:

[hd][px][px][px]
Where hd is a two-bit header, and px is a two-bit pixel.
Pixels are encoded left to right, most significant to least significant.

Possible hd values are:
01    Beginning of line
10    Beginning of frame
11    Reserved

Since 160%3=1, we will have one byte at the end of the line which contains only one pixel.  We can probably simply ignore the other two pixel spaces in this byte, without not needing a special header here (note that this byte will directly precede the byte with a newline header).

friendofmegaman wrote:

Conjecture:

I'm missing bits because the CLOCK handler for the previous interrupt hasn't finished executing.

Yes.  I am quite certain now that this is the case.  I had figured that 24 cycles would be enough to set a variable within an ISR, but it seems like it is not.  My testing code (p. 3), and yours (p. 5) both point to this.

-----------------------------

Here is the original code I wrote using interrupts.  While this does not work on the Teensy (at least at present), I suspect that this strategy would work on a faster board.

/* Author: rvan
   Interrupt-based reading of pixel data from the DMG.

   This code currently DOES NOT WORK; we suspect this is because the ISRs
   (specifically clock_ISR()) are not returning fast enough.  Additionally, the
   code has been modified for clarity since last being tested, it is possible
   that some minor errors have been introduced.
 */

#include <WProgram.h>

#define CLOCK_PIN 23
#define HSYNC_PIN 22
#define VSYNC_PIN 21

unsigned int i;

volatile unsigned char new_pixel = 0;
volatile bool have_pixel = false;

unsigned char three_pixels = 0;
unsigned char num_pixels = 0;

volatile unsigned char place_in_line = 0;

unsigned char line[54];

void clock_ISR() {
    /* Read a pixel */

    //Do something here to wait for the pixel inputs to settle, if necessary.

    new_pixel = PIND;
    have_pixel = true;
}

void vsync_ISR() {
    /* The vsync_ISR() routine is NOT TESTED. */
    /* Note: It may be a bad idea to do serial transfer in the ISR, but we
       leave it here for now to eliminate checking more flags in the work loop.
     */
    /*
       Note also that the new frame and new line markers do not conform to the
       protocol described; in reality the remaining six bits would also contain
       pixel data.
     */
    Serial.write(B100000); //New frame.
}

void hsync_ISR() {
    Serial.write(B010000); //New line.
    place_in_line = 0;
}


int main() {
    DDRD = B00000000; //All pins as input.
    pinMode(CLOCK_PIN, INPUT);
    attachInterrupt(CLOCK_PIN, clock_ISR, FALLING);
    pinMode(HSYNC_PIN, INPUT);
    attachInterrupt(HSYNC_PIN, hsync_ISR, FALLING);
    pinMode(VSYNC_PIN, INPUT);
    attachInterrupt(VSYNC_PIN, vsync_ISR, FALLING);

    Serial.begin(1); //Always runs at Full Speed, the parameter is ignored.
    
    for(;;) {
        if(have_pixel) {
            //Shift the pixels he have two bits to the left, zero the two bits
            //where the new pixel goes,
            //and OR everything together.
            //We assume (for now) that the new pixel is in the two LSBs itself
            //and doesn't need shifting.
            three_pixels = (three_pixels << 2) | (new_pixel & 3); 
            have_pixel = false;
            num_pixels++;
            if(num_pixels == 3) {
                /* We store only 3 pixels. This is so we can use the remaining
                   two bits to show where a frame starts.
                 */
                //Add the pixel to the line.
                line[place_in_line++] = three_pixels;
                if(place_in_line == 40) {
                    //Send the line.
                    Serial.write(line, 40);
                    place_in_line = 0; //We shouldn't need this, as
                    //place_in_line is reset by the hsync ISR.
                }
                num_pixels = 0;
            }
        }
    }
}

Despite our lack of success so far, it seems like we are making progress, and at least coming to a better understanding of the hardware (and software) involved (I am, at least).

Offline
Melbourne, Australia

I still think it is worth trying friendofmegaman's code without the Serial.write slowing down the interrupt to see what difference it makes, as I wrote earlier:

http://chipmusic.org/forums/post/206984/#p206984

Also, using low-level AVR port manipulation commands (PORT, PIN, DDR) isn't necessarily going to speed things up on a Teensy 3.x (ARM) because they just end up going through another level of AVR to ARM emulation:

http://github.com/PaulStoffregen/cores/ … mulation.h

http://forum.pjrc.com/threads/17532

You need to actually use ARM low level commands! Grab the MK20DX256 manual here:

http://www.pjrc.com/teensy/datasheets.html

and check out Chapter 11 - Port control and interrupts (PORT) and Chapter 49 - General-Purpose Input/Output (GPIO)

Offline
CA
uXe wrote:

I still think it is worth trying friendofmegaman's code without the Serial.write slowing down the interrupt to see what difference it makes, as I wrote earlier:

http://chipmusic.org/forums/post/206984/#p206984

Also, using low-level AVR port manipulation commands (PORT, PIN, DDR) isn't necessarily going to speed things up on a Teensy 3.x (ARM) because they just end up going through another level of AVR to ARM emulation:

http://github.com/PaulStoffregen/cores/ … mulation.h

http://forum.pjrc.com/threads/17532

You need to actually use ARM low level commands! Grab the MK20DX256 manual here:

http://www.pjrc.com/teensy/datasheets.html

and check out Chapter 11 - Port control and interrupts (PORT) and Chapter 49 - General-Purpose Input/Output (GPIO)


Thanks for sharing. It's definitely an interesting reading, but still 24 clocks guys, don't be overoptimistic big_smile

Offline
uXe wrote:

I still think it is worth trying friendofmegaman's code without the Serial.write slowing down the interrupt to see what difference it makes, as I wrote earlier:

The code I posted earlier (p. 3), which only increments a variable in the ISR (serial is done in the work loop), demonstrates the same problem.  I also wrote an even simpler version which sends a pulse on an output pin when the clock ISR is called.  I compared the output signal with the clock signal using a DSO (which may not be entirely reliable for this application) and found that the ISR was not being triggered for most of the clock pulses.  I can post this code if you are interested.

The more I read about the Teensy 3.x, the more I feel that it is a poorly thought out development system, with a poor library which makes it unsuitable for this application, and likely for many of the tasks which the Freescale chip would be capable of.  Thankfully, it seems that most (if not all) of the issues lie with the libraries, rather than the board itself, and so can probably be avoided if we do not use the TeensyDuino libraries.

Knowing what I know now, I would have likely not chosen a Teensy for this project, and would recommend to anyone else wishing to get involved and buy hardware to consider other options (and perhaps suggest them here), rather than jumping in and buying a Teensy.  Nonetheless, I have a Teensy now, and so will persist with it and see how far we can get.  Also, the Teensy retains the advantage of being reasonably cheap.

Here is another thought, though: How feasible is it to forget the Teensy and simply use a logic analyser to capture the data, and process it on the PC using code not unlike friendofmegaman's original version (i.e, poll the signals)?  Craig of Flashing LEDs has more or less done this, but can it be done in realtime?

Last edited by rvan (Apr 26, 2014 7:48 am)

Offline

Just a quick question; If you succeed, will you share it?

Offline
Melbourne, Australia
rvan wrote:

The code I posted earlier (p. 3), which only increments a variable in the ISR (serial is done in the work loop), demonstrates the same problem.  I also wrote an even simpler version which sends a pulse on an output pin when the clock ISR is called.  I compared the output signal with the clock signal using a DSO (which may not be entirely reliable for this application) and found that the ISR was not being triggered for most of the clock pulses.  I can post this code if you are interested.

I would be interested, yes. smile

Again, the code you posted earlier is also doing a Serial.write on every HSYNC, and I can't help but think that is eating up a lot of cycles and again would suggest to maybe try storing those values in an array or something instead, and then Serial.write them out once you've collected a certain amount to see if that has any effect on the count?

Offline
CA
Glitch Militia wrote:

Just a quick question; If you succeed, will you share it?


Of course, that's the whole point smile

Offline
Glitch Militia wrote:

Just a quick question; If you succeed, will you share it?

If this was directed at me, the answer is an unequivocal yes.  I am happy to share any of my work on this.  I don't, however, currently own a logic analyser.  I am planning on possibly buying an inexpensive one from China, but this would be a while in arriving.  First, I will do some research into APIs, as we would have to be able to access the data from the logic analyser directly.

uXe wrote:

I would be interested, yes. smile

Here it is:

//Author: rvan
/* A simple program to output pulses when a clock signal is received.  Pin 13
   is the LED pin, but we also attach a DSO probe here.  Currently this program
   DOES NOT produce the intended output, presumably because the ISR returns too
   slowly.
 */

#include <WProgram.h>

#define CLOCK_PIN 23
#define OUT_PIN 13

volatile bool have_clock = false;

unsigned int i;

void clock_ISR() {
    have_clock = true;
}

int main() {
    pinMode(CLOCK_PIN, INPUT);
    pinMode(OUT_PIN, OUTPUT);
    attachInterrupt(CLOCK_PIN, clock_ISR, RISING);

    for(;;) {
        if(have_clock) {
            digitalWrite(OUT_PIN, HIGH);
            i++; i--; i++; i--; //This creates a very short delay.
            digitalWrite(OUT_PIN, LOW);
            have_clock = false;
        }
    }
}
uXe wrote:

Again, the code you posted earlier is also doing a Serial.write on every HSYNC, and I can't help but think that is eating up a lot of cycles and again would suggest to maybe try storing those values in an array or something instead, and then Serial.write them out once you've collected a certain amount to see if that has any effect on the count?

Ah, I see what you mean; I thought you were simply talking about doing a serial write within the ISR or not.  I will try this out tonight.

Last edited by rvan (Apr 26, 2014 8:35 am)

Offline
CA

I though about deploying a logic analyzer. What they do (if understand it correctly) is record a sample to the memory (e.g. flash) and then you read it to PC. A good thing would be to know how to build one and use for real time data processing.

Another option is FPGA. The way it works allows really fast signal processing. In fact in can be clocked by the gameboy and still be able to send data to PC. This is what a guy named Snesy did. I tried to contact him through several sources, but without any luck.

Next - Nitro's suggestion to use uC as clock master. I don't understand how (in case of Teensy at least) it solves the issue of only 24 clocks available to do something with the data? 24 is an optimistic estimate.

Finally - DSP. I think logic analyzers are built upon these. However I've no idea where even start with that.

Any other options?

Before diving into the next idea it would be great to figure which of them will (given enough effort) guarantee success. Not some ugly hacked code that will fail 50% of time, but neat and reliable solution. We know it was done on FPGA.

Pros: it will solve this problem, it should suffice to have even more outputs (VGA and composite) and it can be even used as an adapter to plug a controller - so the whole "desktopification" module can be done with one (probably quite big though) board. Since it can have multiple clock domains, and programmed to do virtually everything (no teleportation though).

Cons: Prototyping boards from Atmega (as an example) cost around $80, not only they are programmed in very strange way I can't find any good comprehensive introduction in programming FPGA.

I need to reflect on this... and wank...

Last edited by friendofmegaman (Apr 26, 2014 8:36 am)

Offline
friendofmegaman wrote:

I though about deploying a logic analyzer. What they do (if understand it correctly) is record a sample to the memory (e.g. flash) and then you read it to PC. A good thing would be to know how to build one and use for real time data processing.

Does this mean that is is not possible to capture a continued data stream with a typical logic analyser?

friendofmegaman wrote:

Next - Nitro's suggestion to use uC as clock master. I don't understand how (in case of Teensy at least) it solves the issue of only 24 clocks available to do something with the data? 24 is an optimistic estimate.

This would probably require underclocking the Game Boy (unless using a faster microcontroller).  Obviously, this isn't ideal.  However, if the Game Boy is driven by a clock signal synthesised by the microcontroller, then pixels will arrive at a predictable cycle relative to the work loop, and can be read directly, with no need for polling or interrupts.  Has anyone (nitro, most likely) tried single-stepping the Game Boy's processor?  Being able to do so would aid in designing a solution using this approach.

friendofmegaman wrote:

Any other options?

I think the option of using SPI (probably dual SPI buses), as per Craig, is still feasible.  I also raised the idea of DMA on the Teensy, but I don't know whether this is feasible or not.  Here is a thread on the PJRC fora which might be relevant: http://forum.pjrc.com/threads/999-ARM-assember-code

For those interested, I have updated my post on the first page with a link to a thread on snesy's Gameboy Classic VGA-Adapter, in German.

Last edited by rvan (Apr 26, 2014 8:56 am)

Offline
Melbourne, Australia
rvan wrote:
uXe wrote:

I would be interested, yes. smile

Here it is:

//Author: rvan
/* A simple program to output pulses when a clock signal is received.  Pin 13
   is the LED pin, but we also attach a DSO probe here.  Currently this program
   DOES NOT produce the intended output, presumably because the ISR returns too
   slowly.
 */

#include <WProgram.h>

#define CLOCK_PIN 23
#define OUT_PIN 13

volatile bool have_clock = false;

unsigned int i;

void clock_ISR() {
    have_clock = true;
}

int main() {
    pinMode(CLOCK_PIN, INPUT);
    pinMode(OUT_PIN, OUTPUT);
    attachInterrupt(CLOCK_PIN, clock_ISR, RISING);

    for(;;) {
        if(have_clock) {
            digitalWrite(OUT_PIN, HIGH);
            i++; i--; i++; i--; //This creates a very short delay.
            digitalWrite(OUT_PIN, LOW);
            have_clock = false;
        }
    }
}

...if you try this code, it should toggle the LED with every clock - ie. the LED should be illuminated on every second clock:

#include <WProgram.h>

#define CLOCK_PIN 23
#define OUT_PIN 13

void clock_ISR() {
    GPIOC_PTOR = B00100000;
}

int main() {
    pinMode(CLOCK_PIN, INPUT);
    pinMode(OUT_PIN, OUTPUT);
    attachInterrupt(CLOCK_PIN, clock_ISR, RISING);

    for(;;) {
    }
}

Last edited by uXe (Apr 26, 2014 10:31 pm)