In CLM3, the highest level loops in the driver (Section A.53) routine run over clumps for each MPI process and provide for OpenMP and Cray Streaming parallelism. Science subroutines, called within these loops, are passed local clump bounds for gridcells, landunits, columns, and PFTs as needed. Relevant filters, in the form of counts and vectors of array indices, are also passed as needed to science subroutines.
Shown below is a portion of a high level loop from the driver routine. First, the number of clumps assigned to the process is obtained and stored in nclumps. The subsequent loop over all clumps is wrapped with OpenMP and Cray Streaming Directives to support shared memory parallelism. Within the loop, the bounds for gridcells, landunits, columns, and PFTs are obtained for the clump being processed by calling get_clump_bounds (Section A.50.10. Then a science subroutine, Hydrology1 (Section A.22.1), is called and passed the column and PFT bounds as well as the non-lake filters for columns and PFTs for the clump being processed. Additional science subroutines are subsequently called within the same loop. The driver routine consists primarily of two such high level loops which call most of the science subroutines used by the model.
nclumps = get_proc_clumps() !$OMP PARALLEL DO PRIVATE (nc,begg,endg,begl,endl,begc,endc,begp,endp) !CSD$ PARALLEL DO PRIVATE (nc,begg,endg,begl,endl,begc,endc,begp,endp) do nc = 1,nclumps call get_clump_bounds(nc, begg, endg, begl, endl, begc, endc, begp, endp) . . call Hydrology1(begc, endc, begp, endp, & filter(nc)%num_nolakec, filter(nc)%nolakec, & filter(nc)%num_nolakep, filter(nc)%nolakep) . . end do !CSD$ END PARALLEL DO !$OMP END PARALLEL DO
Within science subroutines, vector loops run over grid or subgrid units. Vector loops may run over an entire clump of grid or subgrid units, or they may use filters for indirect addressing of a specific list of subgrid units to process. Other loops, which tend to be very short, run over snow and soil levels within a column or PFT. In most cases, the vector (grid/subgrid) loops are contained within the short (level) loops to exploit vectorization opportunities. When writing code in this manner, it is often necessary to split lengthy loops into multiple loops and use temporary local arrays, called vector temporaries, to pass data from one loop to the next. Since arrays in data structures in CLM are implemented as pointers, compilers usually can not determine if vector dependencies exist. As a result, compiler directives are required in order to obtain loop vectorization.
Shown below is an example of a filter loop within a science subroutine. First, local pointers are created to shorten the notation used in equations. The subsequent loop over all non-lake columns is preceded by Cray X1 and NEC/Earth Simulator compiler directives. The first directive tells the Cray X1 compiler that the loop is concurrent, meaning it may be streamed and vectorized. The second directive tells the NEC/Earth Simulator compiler that no vector dependencies exist in the loop. Within the loop, the column index is obtained from the non-lake column filter vector, the appropriate landunit index is obtained from the column's landunit vector, and the appropriate gridcell index is obtained from the column's gridcell vector. Next, the landunit type and the ground temperature of the column are checked. If the landunit contains water and the ground temperature is above freezing, three variables are initialized to zero. Other computations are usually performed within such loops.
! Assign local pointers to derived type ! members (landunit-level) clandunit => clm3%g%l%c%landunit itype => clm3%g%l%itype ! Assign local pointers to derived type ! members (column-level) cgridcell => clm3%g%l%c%gridcell t_grnd => clm3%g%l%c%ces%t_grnd h2osno => clm3%g%l%c%cws%h2osno snowdp => clm3%g%l%c%cps%snowdp snowage => clm3%g%l%c%cps%snowage !dir$ concurrent !cdir nodep do f = 1, num_nolakec c = filter_nolakec(f) l = clandunit(c) g = cgridcell(c) . . . if (itype(l) == istwet .and. t_grnd(c) > tfrz) then h2osno(c) = 0._r8 snowdp(c) = 0._r8 snowage(c) = 0._r8 end if . . . end do