CAM3.0.21 MPI task load distribution in physical space, and performance for 8
processes on 4 8-PE nodes of bluesky. "dyn_equi_by_col" is Pat Worley's new
scheme for allocating gridpoints to tasks. It has no effect in full-grid
mode. Full-grid timing diffs for true vs. false represent slop due to
various machine factors.
| opt
| dyn_equi_by_col
| Full grid
| Seconds per 10 days on 32 CPUs
| Reduced 1-digit grid
| Seconds per 10 days on 32 CPUs
|
| 0
| true
|
| 261.502
|
| 196.125
|
| 0
| false
|
| 260.416
|
| 238.136
|
| 2
| true
|
| 259.154
|
| 207.473
|
| 2
| false
|
| 254.732
|
| 211.340
|
| 3
| true
|
| 248.887
|
| 202.463
|
| 3
| false
|
| 246.038
|
| 238.366
|
CAM3.0.20 MPI task load distribution in physical space, and performance for 8 processes.
| phys_loadbalance option setting
| Full grid
| Seconds per 10 days on 32 CPUs
| Reduced 1-digit grid
| Seconds per 10 days on 32 CPUs
|
| -1
|
| 311.168
|
| 265.640
|
| 0
|
| 255.279
|
| 228.726
|
| 1
|
| 254.133
|
| 230.209
|
| 2
|
| 251.622
|
| 200.346
|
| 3
|
| 246.757
|
| 236.741
|
| 4
|
| 254.954
|
| 221.742
|