I work in IT. They pay me a lot of money to analyze crap and try to understand why sh*t is borked. (Not video games, but real estate systems. Different audience, same problems.)
Seriously. Even the best of us can't foresee every issue.
With 10,000 users on a server, how many PPS is said server sending/receiving? How many PPS of this size can the NIC(s) in the server handle? The NIC drivers/OS kernel? The distribution switch? The network core? And that's just one consideration amongst potentially hundreds of aspects of your infrastructure and systems which can have a drastic impact on performance and stability when its limits are poked.
I'd totally love to hear about the network and hardware and software configurations behind the game. They revealed in one of their posts today that a separate system which seemingly serves all users on the servers in a given data center handles instance entry/exits. That's pretty interesting. I wonder what they use for data storage. I'd really like to see more big mongodb/riak/etc success stories to help hammer home the nails in oracle's coffin. ;)
I'll probably take some packet dumps later and start catalogging some stuff to see what we can figure out.