Documentation

Links

Help

ISG

 

CGD Computing FAQs

CGD_Cluster

Last updated: Mon Aug 9 09:56:28 MDT 2004

1.0 General
    1.1 Why a CGD cluster?
    1.2 How are the clusters being used?
    1.3 Where is it located?

2.0 Hardware
    2.1 What's the cluster configuration?
    2.2 What's the node configuration?

3.0 Software
    3.1 What compilers are installed?
    3.2 What debuggers are installed?
    3.3 What message passing library is available?
    3.4 What about NetCDF?
    3.5 What monitoring software is installed?

4.0 Money Questions
    4.1 How much did the cluster cost?
    4.2 Where did the money come from?

5.0 Usage
    5.1 How do I get an account?
    5.2 I can't log directly into the cluster nodes!!
    5.3 OK, how do I submit a job using PBS?
    5.4 I still need to get an interactive login on the compute nodes to debug.
    5.5 The software package -my-favorite-software- isn't installed.
    5.6 My environment doesn't work right.
    5.7 What queues are set up?
    5.8 What data partitions are mounted?
    5.9 How do I logon to Anchorage?
    5.10 What are the scratch areas?


1.0 General

1.1 Why a CGD cluster?

Lots of reasons:

  • The CGD Clusters fill an architecture gap by providing an x86 platform using a Linux operating system.
  • The cluster also fills a computing gap between the Division desktop systems and the big iron located in the basement.
  • The clusters are dedicated CGD resources and have less contention for use.
  • Flexibility. New tools (compilers, queueing systems, etc) can be easily installed/removed providing a better experimental compute platform.
  • Linux clusters are the favored tool of our largest customer base: universties.

1.2 How are the clusters being used?

The small cluster, Anchorage, has proven effective for testing, debugging, and small model runs for scientific research.

The new cluster, Bangkok, currently (25Nov03) has network limitations that prevent performance from scaling past 16 processors (8 nodes). Until upgrades can be fitted, it is expect to provide a similar role as Anchorage, with the added benefit of more job capacity.

Calgary (26Jan04) is similar to Bangkok in hardware, with the addtion of Infiniband as the network fabric. Currently only 8 nodes are attached. As budget becomes available, nodes will be moved from Bangkok and attached to Calgary.

1.3 Where is it located?

Anchorage, Bangkok, and Calgary are located in the refurbished CGD machine room, ML-315.

[top]


2.0 Hardware

2.1 What's the cluster configuration?

Anchorage:
The cluster has 17 nodes: 1 master node for logins, analysis, and compilations. 16 compute nodes.

Bangkok:
The cluster has 25 nodes: 1 master node for logins, analysis, and compilations. 24 compute nodes.

Calgary:
The cluster has 9 nodes: 1 master node for logins, analysis, and compilations. 8 compute nodes.

2.2 What's the node configuration?

<strong>Anchorage:</strong> Dell Optiplex G260 CPU: 1 x 2.0 GHz Intel P4 w/ 512k cache Memory: 1 GByte Disk: 20 GB disk w/ 14 GByte available for scratch per node. 110 GB disk is available as shared disk space for all nodes. Cluster Network:Gigabit ethernet. Uplink to non-cluster NFS mounts (ie, /data, /fs): 1000Mb/s. <strong>Bangkok: </strong> Dell Precision 450 CPU: 2 x 3.06 GHz Intel Xeon w/ 512k cache Memory: 2 GByte Disk: 40 GB disk w/ 18 GByte available for scratch per node. 240 GB disk is available as shared disk space for all nodes. Cluster Network:Gigabit ethernet. Uplink to non-cluster NFS mounts (ie, /data, /fs): 1000Mb/s.       NOTE: Bangkok has 2 CPUs per node.
      Use
-l nodes=<#nodes>:ppn=2 to pack jobs onto nodes. <strong>Calgary: </strong> Dell Precision 450 CPU: 2 x 3.06 GHz Intel Xeon w/ 512k cache Memory: 2 GByte Disk: 40 GB disk w/ 18 GByte available for scratch per node. 240 GB disk is available as shared disk space for all nodes. Cluster Network:Gigabit ethernet and Infiniband. Uplink to non-cluster NFS mounts (ie, /data, /fs): 1000Mb/s.      NOTE: Calgary has 2 CPUs per node with hyper-threading turned on.
     Use
-l nodes=<#nodes>:ppn=4 to pack jobs onto nodes.

[top]


3.0 Software

3.1 What compilers are installed?

Portland Group High Performance FORTRAN
Portland Group CC
Lahey lf95

3.2 What debuggers are installed?

Portland Group debugger.
Lahey Group debugger.

3.3 What message passing library is available?

MPI is installed and configured. Use the following symlinks for linking code:
/usr/local/mpich-pgi-hpf-cc
/usr/local/mpich-gcc-g++-lf95

3.4 What about NetCDF?

NetCDF is compiled against each of the available compilers. Use the following symlinks for linking code:
/usr/local/netcdf-pgi-hpf-cc
/usr/local/netcdf-gcc-g++-lf95

3.5 What monitoring software is installed?

For CPU usage, SARGE is available at SARGE.
For PBS monitoring, /usr/local/xpbsmon/bin/xpbsmon will monitor the cluster nodes for user and job allocation.
For overall job thruput, see: pbsacctgr.

[top]


4.0 Money Questions

4.1 How much did the cluster cost?

Anchorage: Node: $786.00 x 17 = $13,362.00 Memory: $187.28 x 17 = $3,183.76 Add'l Disk: $249.99 x 1 = $249.99 Gigabit Switch: $2001.99 x 1 = $2,001.99 PGI Compilers: $3614.00 x 1 = $3,614.00 Total: $21,981.50 Bangkok: Master Node: 2,252.00 x 1 = $ 2,252.00 Compute Node: 2,139.00 x 25 = $53,475.00 (24 + 1 Spare) Memory: 899.00 x 26 = $23,374.00 Add'l Disk: 289.00 x 2 = $ 578.00 Gigabit Switch: 1,666.00 x 1 = $ 1,666.00 Gigabit Switch: 2,550.00 x 1 = $ 2,550.00 PGI Compilers: 0 = 0.00 Total: $83,895.00 Calgary: Node: 2,139.00 x 9 = $19,251.00 Memory: 899.00 x 9 = $ 8091.00 Add'l Disk: 289.00 x 2 = $ 578.00 Infiniband Swch: ?.00 x 1 = $ ?.00 Infiniband NICs: ?.00 x 9 = $ ?.00 PGI Compilers: 0 = $ 0.00 Total: $ ?.00

4.2 Where did the money come from?

Anchorage came from the CGD/ISG budget. The Systems Group went through personnel changes in 2002. There were several months when the group was running light on staff. The money saved from salary was diverted to a cluster for the Division.

Bangkok was funded by a cooperative effort from CGD section heads and PIs. Funds were contributed from numerous contracts, budgets, and grants for a community resource.

Calgary was initally part of the Bangkok procurement. Time, money, and a bit of horse trading allowed 8 nodes to be broken off of Bangkok for use as Calgary. CGD/IS spent money for the Infiniband fabric. Additional nodes will be moved as money becomes available.

[top]


5.0 Usage

5.1 How do I get an account?

Submit a request using the wreq system. Accounts are available only to users with existing accounts in CGD. Translation: users need to be affiliated with CGD in some manner. This is not a publicly available compute resource.

5.2 I can't log directly into the cluster nodes!!

That's right. Logins to cluster nodes are restricted to prevent running jobs from being interupted. Login into Anchorage for compiliation, then submit the jobs to the PBS queue manager.

5.3 OK, how do I submit a job using PBS?

Use the following PBS script: #!/bin/sh # ### Job name # #PBS -N MostExcellentJob ### Declare job non-rerunable # #PBS -r n ### Output files - sort to top of directory. # #PBS -e AAA_MostExcellentJob.err #PBS -o AAA_MostExcellentJob.log # Mail to user # #PBS -m ae ### Queue name (short, medium, long, verylong) # #PBS -q medium # # Number of nodes, number of processors # # nodes = physical host # ppn = processors per node (i.e., number of cores) # #PBS -l nodes=1:ppn=48 # # This job's working directory # echo `date` echo Working directory is $PBS_O_WORKDIR cd $PBS_O_WORKDIR # May be necessary for some OMP jobs. # #export KPM_STACKSIZE=50m echo "Environment:" echo "--------------" echo "" # Print out some job information for debugging. # echo Running $PROGNAME on host `hostname` echo Time is `date` echo Directory is `pwd` # Configure the run environment. # module load compiler/intel/default # Convert the host file to use IB # /cluster/bin/make_ib_hosts.sh # Get the number of procs by counting the nodes file, # which was generated from the #PBS -l line above. # NPROCS=`wc -l < $PBS_NODEFILE` echo "Node File:" echo "----------" cat "$PBS_NODEFILE" echo "" # Run the parallel MPI executable # echo "`date` mpiexec - Start" mpiexec -v -np $NPROCS ./MostExcellentJob echo "" echo "`date` MPIRUN - END" exit 0 then submit the the job using /usr/local/torque/bin/qsub <script>

[top]

5.4 I still need to get an interactive login on the compute nodes to debug.

Use the command: /usr/local/torque/bin/qsub -I -l nodes=<number of nodes>

5.5 The software package -my-favorite-software- isn't installed.

Submit a request using the wreq system. Installs on the master node Anchorage should be easy. Installs on compute nodes will need a good justification.

5.6 My environment doesn't work right.

Systems needs to re-write the "atomic" dot files (.login, .cshrc, etc) that are currently in use Division wide. Until then, Cluster users need to set the following environment variables: NOTE: This section dates from Novemeber, 2002. Check /usr/local for updated compilers. csh/tcsh # PGI (v4.x) setenv PGI /usr/local/pgi setenv path = ( $PGI/linux86/bin /usr/local/torque/bin $path) setenv MANPATH "$MANPATH":$PGI/man:/usr/local/torque/man # PGI (5.x) setenv PGI /usr/local/pgi setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${PGI}/linux86/lib:${PGI}/linux86/liblf set path=( $path $PGI/linux86/5.0/bin ) setenv MANPATH "$MANPATH":$PGI/man:/usr/local/torque/man # Lahey stuff... setenv LAHEY /usr/local/lf9561 setenv MANPATH /usr/local/lf9561/manuals/man/ssl2:$MANPATH setenv MANPATH /usr/local/lf9561/manuals/man/lf95:$MANPATH setenv PFDIR /usr/local/lf9561/bin set path = ( $path ${LAHEY}/bin ) ksh/bash # PGI (v4.x) export PGI=/usr/local/pgi export PATH=( $PGI/linux86/bin /usr/local/torque/bin $path) export MANPATH=$MANPATH:$PGI/cdk/man:/usr/local/torque/man # Lahey export LAHEY=/usr/local/lf9561 export MANPATH=/usr/local/lf9561/manuals/man/ssl2:$MANPATH export MANPATH=/usr/local/lf9561/manuals/man/lf95:$MANPATH export PFDIR=/usr/local/lf9561/bin export PATH=$PATH:${LAHEY}/bin )

[top]

5.7 What queues are set up?

[anchorage ~]$ /usr/local/torque/bin/qstat -q server: anchorage Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- verylong -- 72:00:00 -- -- 0 0 10 E R long -- 12:00:00 -- -- 0 0 10 E R medium -- 02:00:00 -- -- 0 0 10 E R small -- 00:20:00 -- -- 0 0 10 E R default -- -- -- -- 0 0 10 E R --- --- [root@bangkok root]# qstat -q server: bangkok.cgd.ucar.edu Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- workq -- -- -- -- 0 0 -- E R verylong -- 72:00:00 -- 8 5 0 10 E R long -- 12:00:00 -- 8 2 0 10 E R medium -- 02:00:00 -- 8 0 0 10 E R small -- 00:20:00 -- 8 0 0 10 E R default -- -- -- -- 0 0 10 E R monster -- 72:00:00 -- 16 0 1 10 E R --- --- 7 1 [root@calgary sbin]# qstat -q server: calgary.cgd.ucar.edu Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- monster -- 72:00:00 -- 8 0 0 10 E R verylong -- 72:00:00 -- 4 0 0 10 E R long -- 12:00:00 -- 4 0 0 10 E R medium -- 02:00:00 -- 4 0 0 10 E R small -- 00:20:00 -- 8 0 0 10 E R default -- -- -- -- 0 0 10 E R --- --- 0 0

[top]

5.8 What data partitions are mounted?

fileserver-n8:/fs/cgd as /fs/cgd fileserver-n8:/fs/tools as /fs/tools klondike:/project as /project

5.9 How do I logon to Anchorage?

Telnet and Rlogin have not been enabled on Anchorage. Use slogin anchorage.

5.10 What are the scratch areas?

The shared cluster scratch area is
Anchorage: /scratch/cluster (110 GB). Anchorage: /scratch/cluster2(115 GB). Bangkok: /scratch/cluster (240 GB). Calgary: /scratch/cluster (240 GB).
Scratch local to each node is
Anchorage: /scratch/local (14 GB).
Bangkok: /scratch/local (18 GB).
Calgary: /scratch/local (18 GB).

There are no scrubbers running as of this writing, so clean up after yourself. It is highly likely that scrubbers will be installed in the near future. UPDATE: Scrubbers were implemented in August, 2004, to clean up after users. The current time-out is no access for more than 60 days.

[top]