The compilation and execution of the parallel applications especially while using some queueing system or another submission environment is not straightforward. In particular, the information abut the nodes parallel application will be running on is not available in advance or even during job submission but is determined when job starts execution. Most of the systems provide such information through the environment variables and files with the list of nodes used for job execution. The list of nodes, especially while miltiprocessor nodes are present can contain multiple lines with the same names. The multiple entries are used, for example while running MPI application, to start multiple instances of the parallel application on the single node. In the case of PCJ library the execution is simple. The most efficient mechanism is to start single Java Virtual Machine on each node. Within this JVM multiple PCJ threads will be run. While running on multiple node, adequate number of JVMs will be started, using ssh or mpiexec command. Please remember, that PCJ threads running within single JVM will use Java Concurrency Library to synchronize and to communicate. Communication between PCJ threads running within different JVMs will be performed using Java Sockets. In such situation in order to run PCJ application we will use two files:
nodes.unique - file containing list of nodes used to run JVMs. In principle this list contains unique names (no duplicated names). This file is used by the mpiexec or other command to start parallel application.
nodes.txt- file containing list of nodes used to start PCJ threads.This list may contain duplicated entries showing that on the particular node multiple PCJ threads will be started (within single JVM). The number of PCJ threads used to run application (PCJ.threadsCount()) will be equal to the number of lines (entries) in this file.
PCJ.jar - PCJ library (replece with proper version eg.
nodes.txt file containing multiple lines with the name of computer (eg. localhost). Than execute application as Java application:
java -cp .:PCJ.jar HelloWorld
Example commands which can be run from script or interactive shell:
mpiexec -hostsfile nodes.unique bash -c 'java -cp PCJ.jar HelloWorld'
The execution is similar to the case of Linux cluster. However, the proper script submitted to the queue to be prepared. This file contains defnition of the parameters passed to the queueining system. The parameters include number of nodes required (
nodes=128) and indicate that 1 process per node will be executed (ppn=1). The execution of java application is preceded by the gathering list of the nodes allocated to the job by the queueing system. The unique list of nodes is then stored in the nodes.unique. Please remember that
nodes.txt can be different.
#!/bin/csh#PBS -N go#PBS -l nodes =128: ppn =1#PBS -l mem =512 mb#PBS -l walltime =0:10:00#PBSmodule load openmpi # if neccessarycat $PBS_NODEFILE > nodes.txtuniq $PBS_NODEFILE > nodes.uniquempiexec -hostsfile nodes.unique bash -c 'java -d64 -Xnoclassgc -Xrs -cp PCJ.jar HelloWorld'
The proper script submitted to the queue to be prepared. This file (
go_xc40.sh) contains defnition of the parameters passed to the queueining system.
#!/bin/bash -l#SBATCH -N 132 # Number of nodes#SBATCH --ntasks-per-node 48 # Numer of tasks per node#SBATCH --mem 5000 # Required RAM#SBATCH --time=00:10:00 # Required time#SBATCH -A GES-00 # Accountsrun hostname > nodes.txtsrun -N 132 -n 132 -c 48 java -cp .:PCJ.jar HelloWorld
In order to optimize execution on the multinode systems like IBM Power 7, the PCJ application should exclusively use computer nodes. However, the number of applications running on each nodes is 1 which is Java VM. The poe command is used to invoke Java VM's on the nodes reserved for the execution.
#@ job_type = parallel#@ node = 2#@ tasks_per_node= 1#@ queuecat $LOADL_HOSTFILE > nodes.txtuniq $LOADL_HOSTFILE > nodes.uniquepoe "java -Xnoclassgc -Xmx6g -cp .:PCJ.jar HelloWorld" -hfile nodes.unique -statistic print -bindproc yes -task_affinity cor