I'll try to answer all the questions in one response.
There's not much going on in the demo apart from proving that it's possible to swap out the NES palette at a measured point on the screen i.e. during the first black gap under the first row of coloured squares. It takes a lot of manual tweaking of cycle-timed code as you only have time to swap 3 colours during horizontal blanking (the time taken by the PPU between drawing the end of one raster line and the start of the next). So, in between each row of square are 3 raster splits that (off the edge of the screen) change 3 colours each, giving you 9 modifiable colours.
I need to code the remaining splits (one for each row of squares). To do that, I can't manually code each one or I'd go insane. Rather, I need to figure an easily repeatable way to achieve the required accurate timing delays, so that's what I'll do next. I don't know much about cycle-timed code so I need to do a bit of research. The demo so far is a bit of trickery with a lot of trial and error
The other side of this is actually coding the colour generation/animation. It's not something I've ever done before.
And, more to the point, where to find the CPU time to do the required data movement/shifting/manipulation - the problem with filling the screen with the timed palette splits is that you're not left with much CPU time to do anything else.
I'll make the source code available once I get the timing loops into a repeatable state and tidy the code up somewhat.