Computational Resources
At ClusterDEM, we leverage advanced computational power to support our research in numerical modeling and simulation. Our mini HPC system, OCTOPUS500 , is a 768-core cluster with 2TB of RAM , running CentOS and built upon ROCKS Cluster. This system enables parallel computing for large-scale simulations, including computational fluid dynamics (CFD) and multiphysics applications.
Running Your Case on OCTOPUS500
To efficiently use our cluster, follow the guidelines below.
Available Parallel Environments
OCTOPUS500 uses a parallel environment defined as mpiX
, where X
represents the number of CPUs per node (from 1 to 10). Ensure that the number of requested slots is a multiple of X
.
✅ Example (Valid Submission):
This launches a run with 5 compute nodes, each using 5 CPUs.
❌ Invalid Cases:
💡 Rule: To optimize resource usage, you should always check the cluster status using Ganglia before submitting jobs:
🔗 Ganglia Monitoring System (Accessible within the UBI network)
Job Submission and Management
Submitting a Job
Use the following command to submit jobs:
After submitting a job with qsub, SGE will respond with something like:
Your job 624556 (“myjob.cmd”) has been submitted
where 624556 is the job number assigned by SGE to your job.
Example OpenFOAM Submission Script
Below is a sample job submission script for OpenFOAM:
📌 Note: This script should be run inside the case folder.
💡 Best Practices:
- Monitor cluster load via Ganglia (http://192.168.92.140/ganglia).
Commonly Used HPC Commands
Below are some frequently used Sun Grid Engine (SGE) commands:
Command | Description |
---|---|
qsub |
Submit a job |
qstat |
Check job status |
qhost |
Display node information |
qdel |
Cancel a job |
qhold |
Hold a job in queue |
qrls |
Release a held job |
🔍 To see detailed descriptions, use:
Would you like any additional refinements or details? 🚀
SGE Commands Reference Guide
1. Submitting a Job (qsub
)
Used to submit jobs to the cluster. The basic syntax is:
-cwd
→ Run job in the current directory-o path
→ Redirect standard output-e path
→ Redirect standard error-M email
→ Email notifications-m bea
→ Send email at beginning, end, and abort-l h=compute-0-x
→ Choose a specific node
Alternatively, you can embed these options inside your script:
Then submit the script with:
2. Monitoring Jobs (qstat
)
Check the status of running and pending jobs:
To see jobs submitted by a specific user:
3. Canceling a Job (qdel
)
To cancel a job, use the job number assigned by qsub
:
If the job does not stop, force termination:
4. Checking Cluster Nodes (qhost
)
To check available nodes and their status:
To view jobs running on each node:
To see available queues: