Offline
Sweeeeeeden
rvan wrote:

Does this mean that is is not possible to capture a continued data stream with a typical logic analyser

This is in itself another possibility. You could use say a Saleae USB logic analyzer and their API for getting data. As long as you are able to record and convert the logic stream uninterrupted on the PC, you have recording right there.

rvan wrote:
friendofmegaman wrote:

Next - Nitro's suggestion to use uC as clock master. I don't understand how (in case of Teensy at least) it solves the issue of only 24 clocks available to do something with the data? 24 is an optimistic estimate.

This would probably require underclocking the Game Boy (unless using a faster microcontroller).  Obviously, this isn't ideal.  However, if the Game Boy is driven by a clock signal synthesised by the microcontroller, then pixels will arrive at a predictable cycle relative to the work loop, and can be read directly, with no need for polling or interrupts.  Has anyone (nitro, most likely) tried single-stepping the Game Boy's processor?  Being able to do so would aid in designing a solution using this approach.

Nailed it. The advantage is the predictability. Once you know display data is about to come, you can send CPU clock pulses and blindly scoop in the data at the correct intervals until the end of the line. I'm claiming this to be doable with, not a 96, but with a 24 or maaaybe even 12 MHz AVR, and I'd be up for the challenge, if I can be bothered.

Details: Back of the envelope calculation of the data rate: 160*144*60*2=2764800 bits per second. An FT232 can handle up to 3 Mbaud communcation. Accounting for 1 start bit and 1 stop bit, that becomes 3000000*8/10=2400000 bits per second actual transfer rate. Too slow, darn. So the choices are to underclock the CPU to 87% of it's regular speed, or come up with some form of compression. RLE compression should work well enough to most situations, as a typical game screen will contain at least 13% repeated data. (That is, the same tile repeated multiple times vertically.) This leaves one problem, though, which is that the screen data isn't sent at the same rate from the GB CPU. It's done in bursts, one line at that time, and then takes a break in the HBlank and VBlank periods, so you would need a serial data ring buffer. But meh, should be easy enough. (Famous last words.)

As for the teensy code. I'm not familiar wiuth how it works, but mostly likely the CPU will be busy for a short while, while the USB processing code is running. In that period, you're missing display data. And also, please don't use interrupts for this. Interrupts are great when you need the CPU to do something else, and still have the ability to be interrupted, but each interrupt does have an overhead which means that they're not suitable for this project. Use a polling loop instead. (For example.)

Offline

Awesome. I just can't help you guys with any programming :-
/

Offline
Melbourne, Australia
uXe wrote:

...if you try this code, it should toggle the LED with every clock - ie. the LED should be illuminated on every second clock:

#include <WProgram.h>

#define CLOCK_PIN 23
#define OUT_PIN 13

void clock_ISR() {
    GPIOC_PTOR = B00100000;
}

int main() {
    pinMode(CLOCK_PIN, INPUT);
    pinMode(OUT_PIN, OUTPUT);
    attachInterrupt(CLOCK_PIN, clock_ISR, RISING);

    for(;;) {
    }
}

...or without toggling and interrupts:

#include <WProgram.h>

#define CLOCK_PIN 23
#define OUT_PIN 13

int main() {
    pinMode(CLOCK_PIN, INPUT);
    pinMode(OUT_PIN, OUTPUT);

    for(;;) {
        if (GPIOC_PDIR & B00000100)
            GPIOC_PSOR = B00100000;
        else
            GPIOC_PCOR = B00100000;
    }
}

Last edited by uXe (Apr 26, 2014 10:33 pm)

Offline
CA
uXe wrote:

...if you try this code, it should toggle the LED with every clock - ie. the LED should be illuminated on every second clock

And how do you tell how many times the LED blinked? A 10 000 000 FPS per second time lapse video? smile

Offline
Melbourne, Australia
friendofmegaman wrote:
uXe wrote:

...if you try this code, it should toggle the LED with every clock - ie. the LED should be illuminated on every second clock

And how do you tell how many times the LED blinked? A 10 000 000 FPS per second time lapse video? smile

Firstly, it is being triggered by the GameBoy clock so should only blink 2 million times a second in the version with interrupts, or 4 million in the version without.

Secondly, rvan's plan was to use a digital oscilloscope to compare the pulses on the LED pin with the GameBoy clock pulses to try and prove one way or the other if the Teensy is capable of at least reading the clock signal, apparently it didn't work with his original code, which is why I was trying to help with this code - but yeah, by all means break out the high-speed camera! tongue

Offline
Michigan
friendofmegaman wrote:

I though about deploying a logic analyzer. What they do (if understand it correctly) is record a sample to the memory (e.g. flash) and then you read it to PC. A good thing would be to know how to build one and use for real time data processing.

Another option is FPGA. The way it works allows really fast signal processing. In fact in can be clocked by the gameboy and still be able to send data to PC. This is what a guy named Snesy did. I tried to contact him through several sources, but without any luck.

Next - Nitro's suggestion to use uC as clock master. I don't understand how (in case of Teensy at least) it solves the issue of only 24 clocks available to do something with the data? 24 is an optimistic estimate.

Finally - DSP. I think logic analyzers are built upon these. However I've no idea where even start with that.

Any other options?

Before diving into the next idea it would be great to figure which of them will (given enough effort) guarantee success. Not some ugly hacked code that will fail 50% of time, but neat and reliable solution. We know it was done on FPGA.

Pros: it will solve this problem, it should suffice to have even more outputs (VGA and composite) and it can be even used as an adapter to plug a controller - so the whole "desktopification" module can be done with one (probably quite big though) board. Since it can have multiple clock domains, and programmed to do virtually everything (no teleportation though).

Cons: Prototyping boards from Atmega (as an example) cost around $80, not only they are programmed in very strange way I can't find any good comprehensive introduction in programming FPGA.

I need to reflect on this... and wank...

Bringing this post back to our attention, I have a small FPGA board that I still need to use. The only reason I haven't yet is because I wrote a Verilog program once and no matter what CPLD or FPGA I chose in the settings, the compiled program was always too large for them. Since there was nothing I could remove from my code, I just gave up altogether on using an FPGA for that project and set it aside. It may very well be time to pull it back out.

I will need for you to fill me on exactly what the h and v sync, clock and data traces do, or at least show me a datasheet on them. I may have to skip the "serial.write" function and move straight to VGA video.

Offline
Michigan

Scratch that. I think I will try out my new Raspberry Pi since it has built in video outputs. No idea where start with programming taht thing though...

Offline
uXe wrote:

Again, the code you posted earlier is also doing a Serial.write on every HSYNC, and I can't help but think that is eating up a lot of cycles and again would suggest to maybe try storing those values in an array or something instead, and then Serial.write them out once you've collected a certain amount to see if that has any effect on the count?

I've tried this out just for completeness; here are the results:

n = 144
mean = 48.92
mode = 49
sd = 5.84

Clearly not right (as I expected); we would expect 160 every time if it were working as it should.

Offline
Michigan

I took a few minutes to find the cycles per instruction, but all I can find are the cycles per assembly op-code. Obviously the code is not in assembly. Is there a way to disassemble the *.cpp.hex file (found in the working directory after compiling)? If we could do that, then you could get the actual cycles per instruction and find out if the interrupts are taking too long.

Is that still a possible cause of the problem?

Offline
CA
Jazzmarazz wrote:

I took a few minutes to find the cycles per instruction, but all I can find are the cycles per assembly op-code. Obviously the code is not in assembly. Is there a way to disassemble the *.cpp.hex file (found in the working directory after compiling)? If we could do that, then you could get the actual cycles per instruction and find out if the interrupts are taking too long.

Is that still a possible cause of the problem?


The problem is that both interrupts and polling based approaches failed. We could predict it before starting the experiments if we thought it through a little bit better. The major problem now is lack of cycles, that's it. I don't see conclusive proof that it can be done with the current setting.

However, Nitro's idea seems to be viable, but what I don't want to do is underclocking the boy to get video. I have a feeling that to implement it this way we gonna need to write some assembly code, because it must run the program normally and when signal arrives (and we can predict that) read the bits. So it should be fine grained to instruction-wise precision.

Aside from FPGA and SNES+SGB there's one more option that actually Craig from FlashingLEDs suggested - use a web cam, put it close to the screen and then on PC side extract the actual frame based on pixel colors. As crazy as it sounds it is a very interesting idea (a good example of out-of-the-box thinking). We need a cam, then front PCB (we can actually saw off the lower half to make it smaller, add some light (either backlight or simple LEDs). But that's theory, to be certain experiments are needed.... In the end it can be hacked into a very tiny device (as big as the GB LCD and as 'fat' as webcam allows, which on its turn too can be disassembled and 'flattened').

Last edited by friendofmegaman (Apr 30, 2014 12:17 am)

Offline
Melbourne, Australia
friendofmegaman wrote:

Aside from FPGA and SNES+SGB there's one more option that actually Craig from FlashingLEDs suggested - use a web cam, put it close to the screen and then on PC side extract the actual frame based on pixel colors. As crazy as it sounds it is a very interesting idea (a good example of out-of-the-box thinking). We need a cam, then front PCB (we can actually saw off the lower half to make it smaller, add some light (either backlight or simple LEDs). But that's theory, to be certain experiments are needed.... In the end it can be hacked into a very tiny device (as big as the GB LCD and as 'fat' as webcam allows, which on its turn too can be disassembled and 'flattened').

That definitely does sound interesting - but if you are going to go to that level of abstraction to record video from a GameBoy, why not just record the output of an emulator instead?

You could have an emulator running the same ROM / saves / whatever that you are running on your GameBoy, hook the Teensy up to the button contacts on the 'Boy and use it as a USB joystick to be able to control both the emulator and the GameBoy in your hands at the same time... you wouldn't even necessarily need a PC for the emulation / recording, a Raspberry Pi would be enough!

That said, I would still like to see rvan try the code I posted above with an oscilloscope to see the results...

Offline
CA
uXe wrote:

That definitely does sound interesting - but if you are going to go to that level of abstraction to record video from a GameBoy, why not just record the output of an emulator instead?

Well in the end of a day you can chiptune on the emulator (without crashes and with much better and cleaner sound or better yet use a MIDI keyboard or a proper synth) as well as play, change the speed, invert, bivert, trivert (and possibly quadvert) and so forth. The point is to spend your time on something pointless! That's what makes one a nerd...

Jokes aside though:
1. Having the recording device you can have nice big picture on PC. On emulator you'll need to upload your save file back and forth. It's a minor inconvenience, but I'd like to eliminate it
2. This also allows you to make *authentic* game reviews / let's plays
3. I find aesthetically more appealing to play on real hardware. Unfortunately GB LCD is shit and I wan't to fix that.

And

the webcam approach can be done not as a separate device that you plug a DMG into as I suggested earlier , but rather as a snap-on thingy that you could use with DMG, pocket or color - this is a huge advantage of the approach IMO. And no modding required. I have enough ideas for chiptune related stuff and the space for LCD breakout socket can be used for something else. Like a breakout with link, power, clock in/out, pre-pot left/right channels to plug into a music station with MIDI, prosounds, pitch and stuff.

Or maybe I just have gone mental and this doesn't make any sense sad

Last edited by friendofmegaman (Apr 30, 2014 1:41 am)

Offline
Michigan

Does anyone know what sort of data is stored in VRAM? Is it simply active sprites, windows, etc? or might it be something useful to us?

Offline
CA

No, but let's find out: http://marc.rawer.de/Gameboy/Docs/GBCPUman.pdf

Offline
Michigan

oops...

Last edited by Jazzmarazz (Apr 30, 2014 2:14 am)

Offline

In the interest of completeness and good progress documentation, I have tested more or less all the code we have so far.  There was a chance this would raise some discrepancies in results which would potentially point to a wiring fault, but it seems that my results are generally as predicted and mostly in line with friendofmegaman's.

friendofmegaman's original Teensy code ("v1"):
My results match friendofmegaman's.  Here is an excerpt of a hex dump of the screen data.  This is an 'interesting' portion; most of the data is all zeros:

0015800: 0000 0000 0000 0000 2c00 0000 0000 0000  ........,.......
0015810: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015820: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015830: 0200 0000 0000 0000 0000 0000 0000 0000  ................
0015840: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015850: 0000 0000 0000 0000 0200 0000 0000 0000  ................
0015860: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015870: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0015880: 3000 0000 0000 0000 0000 0000 0000 0000  0...............
0015890: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00158a0: 0000 0000 0000 0000 2300 0000 0000 0000  ........#.......
00158b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00158c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00158d0: 4300 0000 0000 0000 0000 0000 0000 0000  C...............
00158e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00158f0: 0000 0000 0000 0000 0200 0000 0000 0000  ................

friendofmegaman's python serial dump code:
Works for me (I had to change the port setting), but sometimes hangs after 20 bufferfulls.

Here is a modified version, which prints the data as hex instead of saving it as a binary file.  I have found this useful for testing:

#!/usr/bin/python
#Author: friendofmegaman, rvan
#Print serial data as hexadecimal.

import serial

port = '/dev/ttyACM0'
speed = 115200
buflen = 100
ser = serial.Serial(port, speed)

try:
    j=0
    while True:
        data = ser.read(buflen)
        for i in data:
            print i.encode('hex'), "",
            j+=1
            if j==16:
                print
                j=0
except KeyboardInterrupt as e:
    print 'Shutting down'
    ser.close()

friendofmegaman's Python/Pylab frame display code:
Works, but is slow and CPU-intensive here, probably because PyLab is not designed with all points addressable (APA) graphics in mind.  I might work on a PyGame implementation which uses the data format (3 pixels per byte) which I defined above.  I have done APA graphics successfully in PyGame before.

my simple interrupt test program (i.e., clock pulse counting) (p. 3):
I get data like the following.  Under correct operation, I would expect constant 0xA0s (i.e, 160s).  The pattern shown in the data is repeating, i.e., a rise in values, followed by around three times as long of zeros.  I do now know why this is.

00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  0a  0a  0b  0a  15  15  15  15  16 
15  15  15  15  12  15  15  15  15  15  15  0a  0a  2b  2c  2d 
2e  2f  2f  2e  2f  2f  2e  2d  2c  2d  2c  2c  2d  26  27  00 
00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00 
0d  0d  0d  0e  0e  0e  0e  0e  0e  0e  0e  0e  0e  0e  0e  0e 
00  00  0b  00  00  00  00  00  00  00  00  00  00  00  00  00 
00  00  00  00  00  00  00  0a  0a  0a  0a  15  15  11  15  15 
15  15  16  15  15  15  15  15  15  15  15  0a  2b  2c  2c  2b 
2e  2f  2e  2e  2f  2e  2e  2d  2d  2c  2d  2c  28  26  27  26

Jazzmarazz's cleaned version of friendofmegaman's Teensy code ("v2"):
Unmodified, it does nothing.  If I explicitly declare the pins as inputs, it works as the original (see previous comments).  Nice clean code.  Jazzmarazz, if you're reading this, I'm curious whether there is a reason for using const ints rather than #defines for the pin numbering.  This version in theory introduces some function call overhead.  I don't know how much this would effect things.

Breadboard set up for testing friendofmegaman's code.

friendofmegaman's updated version of Jazzmarrazz's version ("v3):
Results are as for v2, above.  The note about declaring inputs applies here too.

friendofmegaman's clock pulse counting code:
My results match friendofmegaman's.  I do not know, however, what is causing the discrepancies in results between this code and my testing code.  Could someone shed some light on this?  It is quite possibly something trivial.

uXe's LED test v1 (interrupts):
The output is suprisingly consistent, but at a much slower rate.  The LED on pin 13 flickers, but this may be due to the dead time between lines.

DSO output. Channel 1 = GB clock; channel 2 = Teensy output.  Probes set to X10.

uXe's LED test v2 (no interrupts):
The output seems to be at the same rate as the input (as expected), but the pulse width is quite inconsistent, often too wide.

DSO output. Channel 1 = GB clock; channel 2 = Teensy output.  Probes set to X10.

General Notes:
The Teensy must sometimes be restarted (e.g., by removing and re-inserting the USB plug) after programming before it will do anything.  This confused me a couple of times when I forgot to do this and wondered why I wasn't seeing any serial data.

Last edited by rvan (May 2, 2014 4:47 am)