Getting started on CBSUBrito
Published:
Disclaimer: This blog post will only be relevant to members of the Brito Lab.
Software
To see which software are currently available on the server, go to CBSU’s website. If the program you need is not available, email cbsu@cornell to submit a request.
Using a Queue
All CBSUBrito machines are equipped with a Sun Grid Engine (SGE) queue manager. It is important to use the queueing system since we are all trying to access shared resources. When you submit a submission script (“a qsub”), you must specify which queue you are submitting to.
There are long queues for jobs that will take more than 4 hours to complete.
long.q@cbsubrito
long.q@cbsubrito2
long.q@cbsubrito3
There are short queues for shorter jobs (<4 hours). These jobs will be terminated after 4 hours. This queue is always available and will supercede long.q jobs if they are submitted. A certain number of cores are kept for the short.q at all times.
short.q@cbsubrito
short.q@cbsubrito2
short.q@cbsubrito3
Basic commands for SGE queuing system
To use the SGE queue manager, you need to make a shell script. I always name my submission scripts “run_
$ qsub run_myjob.qsub
You can check the status of that job with:
$ qstat run_myjob.qsub
You can add options to the ‘qstat’ command such as ‘qstat -u fnn3’ if you want to see only jobs owned by user fnn3. You can delete a job using:
qdel JOB_NAME
qdel -j JOB_ID # Find the job ID number by checking 'qstat'
qdel * # Delete all of your own jobs
For more information: This is a good guide for commonly used QSUB options
A quickstart guide for using SGE
Example submission scripts
You can download a template qsub here.
There are example scripts for running QC, alignments, and more on all three machines: /workdir/scripts.
#$ -S /bin/bash #Set the environment to bash
#$ -N job123 # Name the job
#$ -o /workdir/users/username/job123.out # Set the standard out
#$ -e /workdir/users/username/job123.err # Set the standard error log
#$ -l h_vmem=5G # Request 5GB of memory for this task
#$ -t 1-4 # Tells the scheduler which array jobs to run, i.e. 1-4, only 4, 50-100, etc
#$ -q short.q@cbsubrito # Specifies which queue to use on which machine
WRK=/workdir/users/username/ # Set path to the working directory
REF=/workdir/users/username/data/references/some_genome # Set path to the reference database
OUTDIR=/workdir/users/username/out # Set the output directory
LIST=$WRK/design_files/zoo_sample_names.txt #Set the list of file names
DESIGN=$(sed -n "${SGE_TASK_ID}p" $LIST) #Look for the specific file that you want to run
NAME=`basename "$DESIGN"` #Pull out the name of the file you want to run
READ1=$WRK/data/${NAME}.derep_1.trim3.fastq #Set READ1 for task_id 1, 2, 3, ...
READ2=$WRK/data/${NAME}.derep_2.trim3.fastq #Set READ2 for task_id 1, 2, 3, ...
cd $OUTDIR # Changes directory to the output directory to run the program
export PATH=/programs/program_that_i_want_to_run # Load the program you want to run, i.e. BWA, BLAST, etc
bwa mem -a $REF $READ1 $READ2 > ${NAME}.sam # Run the program according to its specifications