Updates to MPI software and compilers don't always overcome older hardware, it seems, and the model that tends to fail is the ocean component, pop2. It has the most sophisticated MPI usage, which pushes the implementation much harder than other parts of the CCSM4. So if you're users are focused on pop specifically, that's important.
Having said that, i have had good luck with both the Intel and PGI compilers on linux clusters. In addition, OpenMPI seems to be the current standard for off-the-shelf MPI and that seems to work pretty well, although other MPI software has also worked occasionally.
Unfortunately, I can't provide any particular information about which hardware, interconnect, etc to use at this point. I just don't have enough data to be sure what works and what doesn't.
For some notes on this applicable to most components, but tailored to clm see: Porting CLM4