User Tools

Site Tools


Troubleshooting FAQs

I try to run a program and I get “Illegal instruction (core dumped)“

You are probably trying to run it on the login node, which is not designed for running scientific software. Please request an interactive session instead.

My job has been queuing for a long time, why?

There are different reasons why a job does not start. Sometimes diagnosing the reason is difficult

  • You requested too many cores in the particular job_type: check with qfreeslots the instantaneous availability of each job_type;
  • You requested the wrong number of cores for an mpi calculation: mpi jobs should be requested in multiple of 16 cores;
  • You requested too much ram memory with -l h_vmem=memory per core in GB: You shouldn't request more than 2GB/core for mpi jobs, 8GB/core for SMP jobs, and 128 GB for serial job.
  • You requested an execution time that is too long with -h h_rt=time in hours : Max allowed execution time is 72 hours.

I am in trouble in using interactive sessions through qrsh on Rosalind. When I fail to login, the following message appears: [ ~]$ qrsh -verbose -pe smp* 16 -l h_rt=02:30:00 local configuration login1.prv.ada.cluster not defined - using global configuration Your job 18309 (“QRLOGIN”) has been submitted waiting for interactive job to be scheduled …timeout (5 s) expired while waiting on socket fd 7

This means that the queue is full and there are no nodes available for interactive session, please use jobscript instead or wait for the cluster to become emptier.

I get the following error, otherwise the calculations ran fine: [smp02.prv.ada.cluster:70123] mca: base: component_find: unable to open /opt/apps/openmpi-1.8.2-intel15-threads/lib/openmpi/mca_btl_openib: cannot open shared object file: No such file or directory (ignored) [smp02.prv.ada.cluster:70123] mca: base: component_find: unable to open /opt/apps/openmpi-1.8.2-intel15-threads/lib/openmpi/mca_mtl_psm: cannot open shared object file: No such file or directory (ignored)

This is a harmless warning, you can ignore it.

My jobs always seem to finish after a certain length of time. I sent them to a queue with a 7 day time limit but they ended after three days.

By default, jobs on Rosalind have a time limit of 72 hours. If you require more time you must set the limit as in the example below h_rt=<hours:minutes:seconds>. Jobs with shorter run time limits will be given a higher dispatch priority than longer jobs and setting the time limit will allow the scheduler to perform back filling so during busy times it pays to set the job limit even if you need to run for a shorter time than the default.

#$ -l h_rt=02:00:00

support/faqs.txt · Last modified: 2016/11/03 21:11 by admin