Computational Resources

At ClusterDEM, we leverage advanced computational power to support our research in numerical modeling and simulation. Our mini HPC system, OCTOPUS500 , is a 768-core cluster with 2TB of RAM , running CentOS and built upon ROCKS Cluster. This system enables parallel computing for large-scale simulations, including computational fluid dynamics (CFD) and multiphysics applications.

Running Your Case on OCTOPUS500

To efficiently use our cluster, follow the guidelines below.

Available Parallel Environments

OCTOPUS500 uses a parallel environment defined as mpiX, where X represents the number of CPUs per node (from 1 to 10). Ensure that the number of requested slots is a multiple of X.

✅ Example (Valid Submission):

-pe mpi5 25

This launches a run with 5 compute nodes, each using 5 CPUs.

❌ Invalid Cases:

-pe mpi5 22 # Incorrect, not a multiple of 5
-pe mpi2 5 # Incorrect, not a multiple of 2

Cluster Node Selection

This can be done with the command:

-l h=compute-0-x

where x is the specific node number you want to use.

💡 Rule: To optimize resource usage, you should always check the cluster status using Ganglia before submitting jobs:
🔗 Ganglia Monitoring System (Accessible within the UBI network)


Job Submission and Management

Submitting a Job

Use the following command to submit jobs:

qsub RUN.sh

After submitting a job with qsub, SGE will respond with something like:

Your job 624556 (“myjob.cmd”) has been submitted

where 624556 is the job number assigned by SGE to your job.

Example OpenFOAM Submission Script

Below is a sample job submission script for OpenFOAM:

#!/bin/bash
#$ -S /bin/bash
#$ -pe mpi5 10
#$ -cwd
#$ -N MySimulation
#$ -o log
#$ -e error.err
#$ -p 0
#$ -M youremail@ubi.pt
#$ -m bea

. $HOME/OpenFOAM/OpenFOAM-2.0.x/etc/bashrc # Ensure correct path
ARGS=“–mca btl ^openib –mca btl_tcp_if_include eth0”


# Run the solver
SOLVER=clusterMHDFoam
mpirun -np $NSLOTS $ARGS $SOLVER -parallel

📌 Note: This script should be run inside the case folder.

💡 Best Practices:


Commonly Used HPC Commands

Below are some frequently used Sun Grid Engine (SGE) commands:

Command Description
qsub Submit a job
qstat Check job status
qhost Display node information
qdel Cancel a job
qhold Hold a job in queue
qrls Release a held job

🔍 To see detailed descriptions, use:

man qsub

Examples

Submitting a Job

qsub -cwd -o output.log -e error.log -M youremail@ubi.pt -m bea myscript.sh

Checking Job Status

qstat

Displaying Node Information

qhost

Canceling a Job

qdel <job_number>

If a job is still running after using qdel, you can force cancellation:

qdel -f <job_number>

Would you like any additional refinements or details? 🚀

SGE Commands Reference Guide

1. Submitting a Job (qsub)
Used to submit jobs to the cluster. The basic syntax is:

qsub -cwd -o output.log -e error.log -M your_email@ubi.pt -m bea -l h=compute-0-x script.sh
  • -cwd → Run job in the current directory
  • -o path → Redirect standard output
  • -e path → Redirect standard error
  • -M email → Email notifications
  • -m bea → Send email at beginning, end, and abort
  • -l h=compute-0-x → Choose a specific node

Alternatively, you can embed these options inside your script:

#$ -cwd
#$ -o log
#$ -e error.err
#$ -M your_email@ubi.pt
#$ -m bea
#$ -l h=compute-0-x

Then submit the script with:

qsub script.sh

2. Monitoring Jobs (qstat)
Check the status of running and pending jobs:

qstat

To see jobs submitted by a specific user:

qstat -u username

3. Canceling a Job (qdel)
To cancel a job, use the job number assigned by qsub:

qdel <job_id>

If the job does not stop, force termination:

qdel -f <job_id>

4. Checking Cluster Nodes (qhost)
To check available nodes and their status:

qhost

To view jobs running on each node:

qhost -j

To see available queues:

qhost -q
error: Content is protected !!