Remote Access to Computing Servers and Clusters (GOS)
This section is intended to walk you through the COMPSs execution environment that allow users to execute a COMPSs application using several remote machines and computing clusters from a local machine. This access to remote resources is done through the SSH (Secure Shell) and SCP (Secure Copy) protocols which are the most used protocols to establishing a secure, encrypted connection between a client computer and a remote server within a cluster.
Although, this feature has been designed to work with resources that have a job submission queue. It can also be used to work with any other type of machine that can be accessed by an SSH connection.
Requirements
In order to use COMPSs with remote clusters some requirements must be fulfilled:
Generate a public-private key pair and authorize it in any Cluster that will be used (more details in section Configure SSH passwordless).
Have this remote resources in the known hosts file situated in ~/.ssh/known_hosts.
COMPSs must be installed in both in the master and all the remote Clusters.
Important
Both, the client and the remote computing resource should have the same or a compatible version of COMPSs, which must be 3.2 or higher.
Execution
The execution of an application using this method consists of 3 steps:
Step 1: Deployment
The very first step is to copy the application and its necessary files to the remote machines. If the application uses JAVA or C languages, the compiled files must be also transferred or compiled to the remote machines.
This can be easily accomplished with the scp command as follows:
$ scp -r /local/path/application/ myUser@remoteMachine:/remote/path/.
This must be done for every new application, and then you can run it as many times as needed. If the application is updated this step will be necessary again in order to keep the same application locally and in the remote machines.
Step 2: Configuration
In order to run the application, COMPSs needs the descriptions of the remote machines (e.g. clusters) used for the execution. This information must be provided in two XML files: resources and project XML files (more details in Resources file and Project file). The resources file, has to include the description of the available clusters and the Submission Modes, and the project file has to provide the access information (user, keys) and the location where COMPSs and the application is installed in every cluster.
The following code shows the basic structure of the resources.xml file
using interactive submission mode (a working example of the resources.xml
file using batch submission mode for MN5 in the
Execution example).
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ResourcesList>
<ComputingCluster Name="COMPSsWorker01">
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Interactive/>
</SubmissionSystem>
<BrokerAdaptor>sshtrilead</BrokerAdaptor>
</Adaptor>
</Adaptors>
<ClusterNode Name="compute_node_type1">
<MaxNumNodes>10</MaxNumNodes>
<Processor Name="P1">
<ComputingUnits>8</ComputingUnits>
<Type>CPU</Type>
</Processor>
...
</ClusterNode>
</ComputingCluster>
</ResourcesList>
The following code shows the structure of the project.xml file using
interactive submission mode (a working example of the project.xml file using
batch submission mode for MN5 in the
Execution example).
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Project>
<MasterNode/>
<ComputingCluster Name="COMPSsWorker01">
<LimitOfTasks>10</LimitOfTasks>
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Interactive/>
</SubmissionSystem>
<BrokerAdaptor>sshtrilead</BrokerAdaptor>
</Adaptor>
</Adaptors>
<InstallDir>/opt/COMPSs/</InstallDir>
<WorkingDir>/tmp/COMPSsWorker01/</WorkingDir>
<User>myUser</User>
<ClusterNode Name="compute_node1">
<NumberOfNodes>2</NumberOfNodes>
</ClusterNode>
</ComputingCluster>
</Project>
The Name given to the Computing cluster equals the host name of the remote
cluster and the User tag is the user for that host. For example, if we want
to access the remote machine with myUser@remoteMachine the xml should be
indicated as follows:
<ComputingCluster Name="remoteMachine">
[... ExtraInformation ...]
<User>myUser</User>
</ComputingCluster>
Caution
If an user is not provided, the current user in the local node will be used for the remote nodes.
As shown before, the InstallDir tag is necessary and must be the absolute
path to the folder where COMPSs is installed in the remote cluster.
Submission Modes
The SubmissionSystem tag of the resources.xml and project.xml is used to define how to submit the tasks to the remote resources.
This adaptor supports two different forms for submitting the tasks generated by COMPSs:
Important
If both submission systems are defined as possible, the application will run in interactive mode.
Interactive Mode
This mode directly launches the execution of tasks to remote machines, and should be used if we have direct access to the computing hardware (NO queuing system in the remote machine).
Example of setting the interactive mode, this code MUST be in
resources.xml and OPTIONALLY be in project.xml:
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Interactive/>
</SubmissionSystem>
</Adaptor>
</Adaptors>
Batch Mode
Computing clusters are usually shared by different users and to enable a proper sharing of resources the computations are spawn using a job submission system (e.g. SLURM). The Batch Mode option handles that aspect and manages the execution of the application tasks as jobs in the cluster. Consequently, the user has to provide the following information in the project and resources XML files.
- Port
The port used for SSH Communication.
Optional ; Default: 22
- MaxExecTime
Expected execution time of the application (in minutes).
Optional ; Default: 10
- Queue
Specifies which type of queue system the remote resource has. This queue must have a corresponding cfg file in
<installation_dir>/Runtime/scripts/queues/queue_systemsfolder. For more information, please read this section (Configuration Files).Optional ; Default: computing cluster’s user default queue
- FileCFG
To further customize the supercomputers cfg files contains a set of variables to indicate the queue system used by a supercomputer, paths where the shared disk is mounted, the default values that COMPSs will set in the project and resources files when they are not set by the user and flags to indicate if a functionality is available or not in a supercomputer. This file must have either a corresponding cfg file in
<installation_dir>/Runtime/scripts/queues/supercomputers/folder or an absolute path to a file. For more information, please read this section (Configuration Files).Optional
Important
Inside this file, you can also specify which queue system is going to be used instead with the previous parameter.
Caution
The .cfg files for queues and supercomputers must be in the remote machine.
- Reservation
Some queue systems have the ability to reserve resources for jobs being executed by selected users accounts. A resource reservation identifies the resources in that reservation and a time period during which the reservation is available. Reservation to use when submitting the job.
Optional ; Default: disabled
- QOS
One can specify a Quality of Service (QOS) for each job submitted to the corresponding queue. The quality of service associated with a job might affect the job scheduling priority.
Optional ; Default: computing cluster’s user default qos
- ProjectName
It is possible to define the project name required by the queue system of the computing cluster.
Optional ; Default: computing cluster’s user default project name
The following code snippet shows an example for the batch submission system of MN5 cluster:
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Batch>
<Queue>slurm</Queue>
<BatchProperties>
<Port>200</Port>
<MaxExecTime>30</MaxExecTime>
<Reservation>myReservation</Reservation>
<QOS>debug</QOS>
<FileCFG>mn5.cfg</FileCFG>
<ProjectName>bsc</ProjectName>
</BatchProperties>
</Batch>
</SubmissionSystem>
<BrokerAdaptor>sshtrilead</BrokerAdaptor>
</Adaptor>
</Adaptors>
Important
If batch mode is selected, an environment script is probably necessary.
This script will be executed in any computing nodes that the execution will
ask to the job submission queue.
In this nodes user defined variables can NOT be used.
Calling your own .bashrc might help with some of these problems.
However, you might have to redefine this variables in the script.
source /path/to/userDirectory/.bashrc
[... Rest of the environment script ]
Step 3: Run the application
For further details of the runcompss command check its dedicated Section
(Runcompss command).
$ runcompss --project=/local/path/application/project.xml \
--resources=/local/path/application/resources.xml \
--comm="es.bsc.compss.gos.master.GOSAdaptor" \
[options] \
application_name [application_arguments]
Execution results
The execution result follows the same pattern as other execution environments (see further details in its section, Results).
Regarding the logs when debug is enabled, the out and err logs from
each task are stored in the corresponding log directory within the local
node when each task ends.
Caution
In case of an error that prevents bringing the execution logs, for example,
a lose of connection with the remote resources.
The logs will be located in <WorkingDir>/BatchOutput/task_ID in the
remote machine.
Execution example
Application
In this section, we show how to execute the KMeans Python COMPSs application in batch mode using MareNostrum 5 supercomputer.
In this scenario, we have in our local machine, the KMeans application in
/home/user/kmeans and inside the kmeans directory we only have the
file kmeans.py. And in the remote machine (glogin1.bsc.es),
we have the user bsc12345. So we can access this machine with
ssh bsc12345@glogin1.bsc.es.
In the first step, we have to be sure that COMPSs and all the application
files are available in MN5 (glogin1.bsc.es). For this example, we assume
that the application will be deployed in the user home directory
(/home/bsc/bsc12345/kmeans) and COMPSs is installed in
/apps/GPP/COMPSs/3.4. The following command are used to deploy the
application and check the COMPSs installation:
# In the local machine, copy the application data into MN5
$ scp -r /home/user/kmeans bsc12345@glogin1.bsc.es:/home/bsc/bsc12345/.
$ ssh bsc12345@glogin1.bsc.es
# Inside the remote machine, check where COMPSs is installed
$ module load COMPSs/3.4
$ echo $(builtin cd $(dirname $(which runcompss))/../../..; pwd)
/apps/GPP/COMPSs/3.4
$ exit
In the second step, we create the required xml files and they will be
stored in /home/user/kmeans. Next lines show the XML files for this example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Project>
<MasterNode/>
<ComputingCluster Name="glogin1.bsc.es">
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Batch>
<Queue>slurm</Queue>
<BatchProperties>
<Port>22</Port>
<MaxExecTime>2</MaxExecTime>
<Reservation>disabled</Reservation>
<QOS>gp_debug</QOS>
<FileCFG>mn5.cfg</FileCFG>
<ProjectName>bsc19</ProjectName>
</BatchProperties>
</Batch>
</SubmissionSystem>
</Adaptor>
</Adaptors>
<InstallDir>/apps/GPP/COMPSs/3.4/</InstallDir>
<WorkingDir>/home/bsc/bsc12345/kmeans/tmp/</WorkingDir>
<User>bsc12345</User>
<LimitOfTasks>1000</LimitOfTasks>
<Application>
<Classpath>/home/bsc/bsc12345/kmeans</Classpath>
<EnvironmentScript>/home/bsc/bsc12345/kmeans/env_mn.sh</EnvironmentScript>
</Application>
<ClusterNode Name="compute_node_type">
<NumberOfNodes>2</NumberOfNodes>
</ClusterNode>
</ComputingCluster>
</Project>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ResourcesList>
<ComputingCluster Name="glogin1.bsc.es">
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Batch>
<Queue>slurm</Queue>
</Batch>
</SubmissionSystem>
</Adaptor>
</Adaptors>
<ClusterNode Name="compute_node_type">
<MaxNumNodes>4</MaxNumNodes>
<Processor Name="P1">
<ComputingUnits>8</ComputingUnits>
<Type>CPU</Type>
</Processor>
</ClusterNode>
</ComputingCluster>
</ResourcesList>
And the environment script for MN5 (/home/bsc/bsc12345/kmeans/env_mn.sh):
export COMPSS_PYTHON_VERSION=3.12.1
module load COMPSs/3.4
Finally, we launch the application in the third step. It must be done using the following command within the local machine:
$ runcompss --project=/home/user/kmeans/project.xml \
--resources=/home/user/kmeans/resources.xml \
--comm=es.bsc.compss.gos.master.GOSAdaptor \
kmeans.py -n 10240000 -f 8 -d 3 -c 8 -i 10
Tip
The same command can be used to run Java or C applications using the GOS
adaptor (but take into account that the --classpath flag is will
be needed for Java and --library_path will be needed for C).
Jupyter notebook
In this section, we show how to execute the a Jupyter notebook in batch mode.
The first step requires to make sure that COMPSs is available in the remote
machine (e.g. glogin1.bsc.es). For this example, we assume that COMPSs is
installed in /apps/GPP/COMPSs/3.4.
Important
When using jupyter notebook it is not necessary to transfer the application to the remote machine, since COMPSs will deal with the code automatically.
In the second step, we create the required project and resources xml files
and they will be stored in /home/user/notebook. They are the same as
defined in project.xml and resources.xml.
Finally, in the third step we can define in our local machine the notebook
/home/user/notebook/simple.ipynb. Note that the ipycompss.start call
includes the project and resources parameters, as well as the GOS
communication adaptor.
import pycompss.interactive as ipycompss
ipycompss.start(comm="GOS",
project_xml="/home/user/notebook/project.xml",
resources_xml="/home/user/notebook/resources.xml")
# Now define your tasks and code within the following cells
Hybrid execution example
Sample Application
In this section, we show how to execute a really simple Python application for COMPSs in batch mode using two clusters. In particular, this example uses the two MareNostrum 5 supercomputer partitions (one with powerful CPUs (GPP) and another with GPUs (ACC)) from the local machine..
In this scenario, we have in our local machine, the application in
/home/user/simple and inside the simple directory we only have the
file simple.py. And in the remote machines (glogin1.bsc.es for GPP and
alogin1.bsc.es for GPP, we have the user bsc12345. So we can access
these machines with ssh bsc12345@glogin1.bsc.es and
ssh bsc12345@alogin1.bsc.es.
The application that we are going to use is:
from pycompss.api.task import task
from pycompss.api.constraint import constraint
from pycompss.api.api import compss_wait_on
@constraint(processors=[{'ProcessorType':'CPU', 'ComputingUnits':'100'}])
@task(returns=1)
def increment(value):
# Code that uses 100 CPU cores
return value + 1
@constraint(processors=[{'ProcessorType':'CPU', 'ComputingUnits':'20'},
{'ProcessorType':'GPU', 'ComputingUnits':'1'}])
@task(returns=1)
def multiply(value):
# Code that uses 20 CPU cores and 1 GPU
return value * value
def main():
value = 2
results = []
for i in range(2):
partial = increment(value)
complete = multiply(partial)
results.append(complete)
results = compss_wait_on(results)
print(results)
if __name__=="__main__":
main()
This application has two tasks defined (increment and multiply) with
different requirements. Since one of the MN5 partitions has GPUs, this example
illustrates how COMPSs is able to deal with two different clusters executing
the tasks respecting their constraints. The increment task is represents a
function with a high internal parallelism, requiring 100 CPU cores, and the
multiply function represents a function with less internal parallelism, but
requiring one GPU. Consequently, the increment tasks can only be executed
in the GPP partition (the ACC partition CPUs have only 80 GPU cores), while
the multiply tasks can only be executed in the ACC partition (the GPP
partition although it has enough CPU cores, does not have GPUs).
The main function loops over two iterations invoking two times the
increment and multiply tasks. Notice that there is a data dependency
between the tasks.
In the first step, we have to be sure that COMPSs and the application
is available in MN5. For this example, we assume that the application will be
deployed in the user home directory (/home/bsc/bsc12345/simple) which is
shared among partitions and COMPSs is installed in /apps/GPP/COMPSs/3.4
in GPP and in /apps/ACC/COMPSs/3.4 in ACC. The following command are used
to deploy the application and check the COMPSs installation:
# In the local machine, copy the application data into MN5
$ scp -r /home/user/simple bsc12345@glogin1.bsc.es:/home/bsc/bsc12345/.
$ ssh bsc12345@glogin1.bsc.es
# Inside the remote machine within GPP, check where COMPSs is installed
$ module load COMPSs/3.4
$ echo $(builtin cd $(dirname $(which runcompss))/../../..; pwd)
/apps/GPP/COMPSs/3.4
$ exit
$ ssh bsc12345@alogin1.bsc.es
# Inside the remote machine within ACC, check where COMPSs is installed
$ module load COMPSs/3.4
$ echo $(builtin cd $(dirname $(which runcompss))/../../..; pwd)
/apps/ACC/COMPSs/3.4
$ exit
In the second step, we create the required xml files and they will be
stored in /home/user/simple. Next lines show the XML files for this example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Project>
<MasterNode/>
<ComputingCluster Name="glogin1.bsc.es">
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Batch>
<Queue>slurm</Queue>
<BatchProperties>
<Port>22</Port>
<MaxExecTime>2</MaxExecTime>
<Reservation>disabled</Reservation>
<QOS>gp_debug</QOS>
<FileCFG>mn5.cfg</FileCFG>
<ProjectName>bsc00</ProjectName>
</BatchProperties>
</Batch>
</SubmissionSystem>
</Adaptor>
</Adaptors>
<InstallDir>/apps/GPP/COMPSs/3.4/</InstallDir>
<WorkingDir>/home/bsc/bsc12345/simple/gpp/</WorkingDir>
<User>bsc12345</User>
<LimitOfTasks>1000</LimitOfTasks>
<Application>
<Classpath>/home/bsc/bsc12345/simple/</Classpath>
<EnvironmentScript>/home/bsc/bsc12345/simple/env_gpp.sh</EnvironmentScript>
</Application>
<ClusterNode Name="compute_node_type">
<NumberOfNodes>2</NumberOfNodes>
</ClusterNode>
</ComputingCluster>
<ComputingCluster Name="alogin1.bsc.es">
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Batch>
<Queue>slurm</Queue>
<BatchProperties>
<Port>22</Port>
<MaxExecTime>2</MaxExecTime>
<Reservation>disabled</Reservation>
<QOS>acc_debug</QOS>
<FileCFG>mn5_acc.cfg</FileCFG>
<ProjectName>bsc00</ProjectName>
</BatchProperties>
</Batch>
</SubmissionSystem>
</Adaptor>
</Adaptors>
<InstallDir>/apps/ACC/COMPSs/3.4/</InstallDir>
<WorkingDir>/home/bsc/bsc12345/simple/acc/</WorkingDir>
<User>bsc12345</User>
<LimitOfTasks>1000</LimitOfTasks>
<Application>
<Classpath>/home/bsc/bsc12345/simple/</Classpath>
<EnvironmentScript>/home/bsc/bsc12345/simple/env_acc.sh</EnvironmentScript>
</Application>
<ClusterNode Name="compute_node_type">
<NumberOfNodes>2</NumberOfNodes>
</ClusterNode>
</ComputingCluster>
</Project>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ResourcesList>
<SharedDisk Name="Disk1">
<Storage>
<Size>100.0</Size>
</Storage>
</SharedDisk>
<ComputingCluster Name="glogin1.bsc.es">
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Batch>
<Queue>slurm</Queue>
</Batch>
</SubmissionSystem>
</Adaptor>
</Adaptors>
<SharedDisks>
<AttachedDisk Name="Disk1">
<MountPoint>/</MountPoint>
</AttachedDisk>
</SharedDisks>
<ClusterNode Name="compute_node_type">
<MaxNumNodes>4</MaxNumNodes>
<Processor Name="CPU_MN5_GPP">
<Architecture>worker_gpp</Architecture>
<ComputingUnits>112</ComputingUnits>
<Type>CPU</Type>
</Processor>
</ClusterNode>
</ComputingCluster>
<ComputingCluster Name="alogin1.bsc.es">
<Adaptors>
<Adaptor Name="es.bsc.compss.gos.master.GOSAdaptor">
<SubmissionSystem>
<Batch>
<Queue>slurm</Queue>
</Batch>
</SubmissionSystem>
</Adaptor>
</Adaptors>
<SharedDisks>
<AttachedDisk Name="Disk1">
<MountPoint>/</MountPoint>
</AttachedDisk>
</SharedDisks>
<ClusterNode Name="compute_node_type">
<MaxNumNodes>4</MaxNumNodes>
<Processor Name="GPU_MN5_ACC">
<Architecture>worker_acc</Architecture>
<ComputingUnits>4</ComputingUnits>
<Type>GPU</Type>
</Processor>
<Processor Name="CPU_MN5_ACC">
<Architecture>worker_acc</Architecture>
<ComputingUnits>80</ComputingUnits>
<Type>CPU</Type>
</Processor>
</ClusterNode>
</ComputingCluster>
</ResourcesList>
And the environment scripts for MN5 are /home/bsc/bsc12345/simple/env_gpp.sh
and /home/bsc/bsc12345/simple/env_acc.sh:
export COMPSS_PYTHON_VERSION=3.12.1
module load COMPSs/3.4
export COMPSS_PYTHON_VERSION=3.12.1
module load COMPSs/3.4
Finally, we launch the application in the third step. It must be done using the following command within the local machine:
$ runcompss --project=/home/user/simple/project.xml \
--resources=/home/user/simple/resources.xml \
--comm=es.bsc.compss.gos.master.GOSAdaptor \
simple.py
Tip
The same command can be used to run Java or C applications using the GOS
adaptor (but take into account that the --classpath flag is will
be needed for Java and --library_path will be needed for C).
Notebook
In this section, we show how to execute the a Jupyter notebook in batch mode using multiple computing clusters.
The first step requires to make sure that COMPSs is available in the remote
machines (e.g. glogin1.bsc.es and alogin1.bsc.es). For this example,
we assume that COMPSs is installed in /apps/GPP/COMPSs/3.4 within
glogin1.bsc.es, and /apps/ACC/COMPSs/3.4 within alogin1.bsc.es.
Important
When using jupyter notebook it is not necessary to transfer the application to the remote machine, since COMPSs will deal with the code automatically.
In the second step, we create the required project and resources xml files
and they will be stored in /home/user/notebook. They are the same as
defined in hybrid_project.xml and hybrid_resources.xml.
Finally, in the third step we can define in our local machine the notebook
/home/user/notebook/simple.ipynb. Note that the ipycompss.start call
includes the project and resources parameters, as well as the GOS
communication adaptor.
import pycompss.interactive as ipycompss
ipycompss.start(comm="GOS",
project_xml="/home/user/notebook/hybrid_project.xml",
resources_xml="/home/user/notebook/hybrid_resources.xml")
# Now define your tasks and code within the following cells