My take on this is this: Hook up a damned microcontroller that acts as a middle man. An Arduino, just for blog cred, perhaps? The microcontroller then waits for sync pulses from MIDI or Sync24 or whatever you want. As soon as it receives a pulse, it increments a counter. When a clock is received by the NES, the counter is again decremented. Typically the counter will never get above 1 or 2 or so, as we'll see.
On the NES side, the CPU can do one of two things:
1) Deal with audio. (Be busy)
2) Constantly poll for new data. If the CPU is idle, nothing in the whole world stops it from polling data as fast as it can, not just once every 60 Hz frame. As soon as it receives a pulse, it gets busy, and then returns to waiting for a pulse. The $4016/17 registers are essentially just two-way bitbanging.
And if you really hate microcontrollers and MIDI, this idea should be realizable with only some 74hc logic, assuming the timing input is sync24.