But the "global" databases aren't setup like that. They're master-slave, with about 5 slaves doing various things. If the global master fails, we're screwed.
That's why the global master is on really nice hardware... we don't want it to fail.
Now, we're moving to putting the entire global database on MySQL Cluster so it's spread between a bunch of machines and entirely in memory, but we're not there yet.
Last night at about 2:25 am, the megaraid2 driver in Linux 2.4.28 bit it, spewing errors all over. It was a bitch and a half to recover from, but I think we finally finished up about 8 am this morning. (lisa did most the work) Luckily once the global master came back up we could run on that without any slaves for a while since it was low-traffic time. Getting the slaves back up was tedious, but easy.
This, folks, is a perfect example of why I'm still not happy with our architecture. Our global master needs to be on MySQL cluster. We could even do shared disks and two identical global masters, but the failover between them, and the possibility of either or both corrupting the filesystem and tablespace isn't comforting...
In the meantime I'm going to be studying the changes in the megaraid2 driver between Linux 2.4 and Linux 2.6 and seeing who else has seen this sort of problem.
Fun fun fun....