Meh, I had a big reply typed up, but lost it :/
When patching D2, the bits right before the patch location are important, the bits after it not so much.
We're patching the region string from 'E' to 'A'.
The BIOS probably reads in the string and does some string manipulation on it that requires the earlier bits to be intact.
So the patcher can drag D2 low for a bit longer, but it must be timed to start exactly right.
The 2uS delay:
I never checked whether bitSet() and co. (Arduino convenience functions) compile to the same code on all AVRs.
If they're not directly usable for port manipulation, that could explain the delay.
I don't think the issue is with the oscillator, as we've just waited for the A18 signal and only 45uS have passed since then.
Having a 2uS divergence so soon would be a bit much, I think.
There's 16 operations possible within 2uS on the 8Mhz chips. Maybe something is possible with regards to sampling D2?
The ports would have to be able to switch at CPU speed and each opcode would have to be 1 cycle, including switching from input to output... Surely needs hand coded ASM.. Not sure.
But the fuse stuff is great! That will allow for some nice things