Supported Executors

Several different job execution models are supported. You can view these how you would the runner executors as they operate approximately the same way.

Selecting the correct target executor for you deployment is crucial to ensuring that all runner generated scripts are ran in the expected manner.

Executors

Declaring the target executor via Jacamar’s configuration is required.

System

executor = ???

Cobalt (qsub)

cobalt or qsub

Flux (flux)

flux alloc

LSF (bsub)

lsf or bsub

PBS (qsub)

pbs

Shell (bash)

shell

Slurm (sbatch)

slurm or sbatch

[general]
   executor = "cobalt"

[batch]
  arguments_variable = ["SITE_PARAMETERS"]

Additional configuration options exist specifically to manage batch executors. See the batch table documentation.

Cobalt (qsub)

Jobs are submitted using qsub with both the output as well as error logs being monitored.

../../../_images/batch_cobalt.svg
  1. The runner generated build script is submitted to the scheduler using qsub. Both stdout/stderr are managed via the --output and --error argument respectively. Finally all SCHEDULER_PARAMETERS are integrated into the request.

  2. Job state is monitored using qstat, identifying if the job is currently running on a set interval.

  3. Throughout the duration of the job the runner obtains the stdout/stderr by tailing both files.

  4. Upon completion of the job (no longer found in queue) the final exit status is queried using the generated <jobid>.cobaltlog to determining if the CI job should pass or fail.

Flux (flux)

The Flux integration leverages flux alloc to submit an interactive job.

../../../_images/batch_flux.svg
  1. The runner generated script is submitted to the scheduler for execution using flux alloc. All user defined SCHEDULER_PARAMETERS are integrated into the allocation request.

  2. The interactive session’s stdout/stderr is monitored by the runner and streamed back to the server.

  3. Due to the interactive session the exit status of the flux alloc command is used to determine if a job passed or failed.

LSF (bsub)

LSF leverages bsub to submit an interactive job.

../../../_images/batch_lsf.svg
  1. The runner generated script is submitted to the scheduler for execution using bsub -I. All user defined SCHEDULER_PARAMETERS are integrated into the request.

  2. The interactive session’s stdout/stderr is monitored by the runner and reported back to the server.

  3. Due to the interactive session the exit status of the bsub command is used to determine if a job passed or failed.

PBS (qsub)

Note

Please note that this executor is still under development and we are actively taking feedback to facilitate improvements.

With PBS we support job submission via qsub.

../../../_images/batch_pbs.svg
  1. The runner generated script is submitted to the scheduler using qsub. The runner controls the schedulers -o (output), -j eo, -Wblock=true and -N (job name) arguments while all user defined SCHEDULER_PARAMETERS are also integrated.

  2. Throughout the duration of the job the runner obtains the stdout/stderr by tailing the file (pbs-ci-<jobID>.out). All output to this file is reported back to the CI job log.

  3. Once a job has been completed the final state is determined by the exit status of qsub -Wblock=true ...

Shell (bash)

Jacamar’s shell executors in many aspects simply mirrors the GitLab version. With the key exception that great strides have been taken to dramatically improve security of running jobs even on multi-tenant environment.

All job scripts are ultimately executed on a shell spawned locally the the running Jacamar instances:

cat generated-script | env -i /bin/bash --login

Though this may add complexity for users with complicated Bash profiles it ensures that they will always get an understandable and most importantly, functional shell environment.

Slurm (sbatch)

The Slurm integration revolves around submitting the job scripts using sbatch then tailing the subsequently generated --output log file.

../../../_images/batch_slurm.svg
  1. The runner generated script is submitted to the scheduler using sbatch. The runner controls the schedulers --output, --wait, and --job-name arguments while all user defined SCHEDULER_PARAMETERS are also integrated into the request.

  2. Throughout the duration of the job the runner obtains the stdout/stderr by tailing the file (slurm-%j.out). All output to this file is reported back to the CI job log.

  3. Once a job has been completed the final state is determined by the exit status of sbatch --wait ...

It is important to note that the entire build script is submitted via sbatch. As such it will run entirely on the target compute resources.

CI Job Build Stages

When closely examining a GitLab CI job you may notice a number of distinct shells being generated and scripts launched over the course of said job. This behavior falls in line with the upstream GitLab runner design to breakdown a single CI job into several stages (e.g. git sources, execute build script, etc.), each accomplishing a specific target with the job. In a more traditional shell executor every stage is launched in a similar shell spawned on the host environment. In the case of executors that seek to interface with an underlying scheduler:

  1. To begin the job necessary preparations are made, source are obtained (git), and artifacts/caches are made available. Each of these stages within the CI job occur on the host environment of the Jacamar.

  2. If all previous stages are completed successfully the step script (a combination of the before_script and script) is submitted to the scheduler.

  3. Finally all remaining stage, including the after_script again occur on the node where the Jacamar is located.

Simply put only the user’s before_script and script are ever submitted as a job script to the underlying scheduler. This provides a number of benefits to the user, chiefly that compute cycles are never wasted on potentially minimal data management actions (e.g. relocating the runner cache). However, you will note that the user defined after_script section is also run on the host system. This is by design and allows potential users to execute actions that may otherwise be impossible in a traditional compute environment.