CAM3.0.21 MPI task load distribution in physical space, and performance for 8 processes on 4 8-PE nodes of bluesky. "dyn_equi_by_col" is Pat Worley's new scheme for allocating gridpoints to tasks. It has no effect in full-grid mode. Full-grid timing diffs for true vs. false represent slop due to various machine factors.

opt dyn_equi_by_col Full grid Seconds per 10 days on 32 CPUs Reduced 1-digit grid Seconds per 10 days on 32 CPUs
0 true 261.502 196.125
0 false 260.416 238.136
2 true 259.154 207.473
2 false 254.732 211.340
3 true 248.887 202.463
3 false 246.038 238.366


CAM3.0.20 MPI task load distribution in physical space, and performance for 8 processes.

phys_loadbalance option setting Full grid Seconds per 10 days on 32 CPUs Reduced 1-digit grid Seconds per 10 days on 32 CPUs
-1 311.168 265.640
0 255.279 228.726
1 254.133 230.209
2 251.622 200.346
3 246.757 236.741
4 254.954 221.742