Configurations
All aspects of the Jacamar CI configuration are documented here. In addition, key elements of the GitLab Runner that are related either to the custom executor or general concerns. In both cases the configurations are managed via the TOML format.
Jacamar CI Config
Due to the highly configurable nature of Jacamar CI, it requires its own configuration file. If new to this application we recommend that you first review the admin tutorial.
Unless explicitly noted there are no default string/integer values
and unset booleans will be false
.
[general]
- Table
Key |
Description |
---|---|
|
A required setting that specifies which of the supported executors CI jobs will utilize. |
|
A required setting where all files/directories for a job are stored. Strict ownership ( |
|
Keep all files generated by the executor and/or scheduling mechanisms (default: |
|
Observe the directory specified by a user’s |
|
Administrator defined string, (currently) only appears in Jacamar’s system logging capabilities to help distinguish from other instances. |
|
Maximum timeout (duration string default: |
|
Shell path to be used when constructing Bash shell for job script execution, when not set will resolve based upon |
|
Custom message that will be conveyed at the start of every |
|
Trusted URL for GitLab server used in all web interactions, takes priority over any values identified in the job response. |
|
File containing the required certificates for HTTPS actions. |
|
Allow for unfettered usages of tokens via the command line by all runner generated job scripts. |
|
Create a static folder ( |
|
Minimum number of days any static directory can remain (default: 7 with 0 indicating no clenaup required).This will only be enforced during job cleanup and may lead to longer than average job durations. |
|
Set base permissions on Jacamar generated data directories to allow read and execute access for groups (ie, |
|
Environment variable to be checked for an |
|
Overrides behavior found in RHEL 8 that reverts the stack size to 8m when capabilities are used. This will cause the user’s environment to set the ulimit to match the available hard limit, normally configured through the runner’s systemd file. |
The [general]
configuration is applicable to a range of features and can
affect both the jacamar-auth
and jacamar
applications.
[general]
executor = "shell"
data_dir = "/ecp"
retain_logs = true
custom_build_dir = false
name = "My Jacamar Driver"
kill_timeout = "120s"
job_message = """
****************************************************************************
NOTICE TO USERS
This is an example message ....
****************************************************************************
"""
gitlab_server = "https://gitlab.example.com"
tls-ca-file = "/example/file.crt"
unrestricted_cmd_line = false
static_builds_dir = false
static_min_days = 7
group_permissions = false
jwt_env_variable = "SITE_ID_TOKEN"
data_dir
- Required
[general]
data_dir = "/ecp"
The data_dir
is used as the base directory where all required build,
cache, and script related contents are stored. Unlike a traditional runner
builds/cache directory
the data_dir
will seek to enforce user ownership over the files
by establishing a top level 700 permission directory by default, although
this can also be configured to allow for a 750 permission directory if
group access is desired. Additionally, if set to data_dir = "$HOME"
,
jobs will be stored in the user specific home directory.
It should be noted that upon changing the group_permissions
field in the
configuration, as well as the accompanying feature flag, the data_dir
will
be deleted and recreated as a completely new directory with the
appropriate permissions.
The easiest way to understand the data_dir
setting is to examine its
effect. In our example /ecp
is the base directory and immediately
proceeding it is the local user responsible for triggering the job’s
individual folder:
$ namei -l $(pwd)
f: /ecp/user/builds/runnerShort/000/group/project
drwxr-xr-x root root /
drwx-----x root root ecp
drwx------ user user user
drwx------ user user builds
drwxr-xr-x user user runnerShort
drwxr-xr-x user user 000
drwxr-xr-x user user group
drwxr-xr-x user user project
Jacamar is responsible for generating the /user/{builds/cache/script}
directories. It is important to note that this is the responsibility
of jacamar
not jacamar-auth
. The authorization process will identify
all required directory paths but it becomes the responsibility of the user
owned process to realize their creation. Strict rules regarding ownership
and permissions are enforced for this directory creation process.
Important
It is not required to allow the user to generate their own base directory
(/ecp/user
in our example structure), in fact we understand it is
desirable to have an administrative process create these automatically.
Just ensure that the folder has proper ownership (user:user
)
and permissions (0700
) or else the job will fail.
Note
When choosing a data_dir
for a HPC scheduler executor type verify
that the volume is mounted in the same way on the runner host system as
it is on any of the available compute resources
(source issue#155).
limit_build_dir
An issue with traditional data_dir
can be observed with the structure of
the builds_dir, specifically the inclusion of the runner short token in
the generated path. When managing a small pool
of runners that share the same data_dir
this doesn’t present a major issue.
However, the multiplying affect these folders have can become quickly
apparent as you scale up the number of runners across machines/clusters.
To help address this the limit_build_dir
has been introduced to offer
a solution that avoids runner specific folders while still meeting the
requirement that each CI job executes in its own unique directory. It
is best to highlight this in action:
[general]
data_dir = "/store/$USER/ci"
limit_build_dir = true
Once enabled our process will utilize the data_dir
and
observe all rules surrounding permissions but instead of constructing
a standardized path it will claim the next available
concurrent directory using fcntl.
The resulting build directory will be created using the project name coupled with the ID and within that structure concurrent folders are managed:
$ pwd
/store/username/ci/builds/project-name_uniqueID
$ ls -a
.000.lock .001.lock .002.lock 000 001 002
The Jacamar application will utilize and observe these locks files during initial configuration in order to claim a concurrent ID regardless of the runner.
$ cat .004.lock | jq
{
"job_id": "2424067",
"expiration": "1714066690",
"hostname": "example"
}
To the average user this won’t present any changes to their workflows; however,
it does greatly alter the structure of the <data_dir>/builds
directory. We
strongly advise that if you utilize this feature you remove all existing
build directories and start from scratch.
Key |
Description |
---|---|
|
Enforces a limited structure on the builds_dir by creating a user driven process to automatically claim concurrent directories through file locking. |
|
Indicates how many concurrent build directories can be left on the system (default: 0, only limited by cumulative runner concurrency). |
|
By default cleanup is limited to a single builds_dir in every job. This is to limit a CI job becoming “stuck” during clean_exec, during which we lack the ability to directly notify the user of any cleanup actions. |
|
Create a log file that outlines all actions of the |
|
Only a user (via the |
[general]
limit_build_dir = true
max_build_dir = 0
uncap_build_dir_cleanup = false
file_lock_debug = false
user_enabled_limit = false
When deploying and testing this feature for the first time it may prove
beneficial to enable the file_lock_debug
option. This results in a
folder (lock_debug/
) appearing along side the concurrent directories
and lock files. Within this each job will have details regarding
the file locking process.
$ cat 2424067.json
{"level":"info","msg":"unable to lock file 0: fcntl syscall error: resource temporarily unavailable","time":"2024-04-25T16:07:49Z"}
{"level":"info","msg":"lock file /ci/username/builds/project-name_uniqueID/.001.lock has not expired","time":"2024-04-25T16:07:50Z"}
{"level":"info","msg":"unable to lock file 2: fcntl syscall error: resource temporarily unavailable","time":"2024-04-25T16:07:50Z"}
{"level":"info","msg":"lock file /ci/username/builds/project-name_uniqueID/.003.lock has not expired","time":"2024-04-25T16:07:57Z"}
{"level":"info","msg":"file claimed with 1714066690 expiration on ci-test-2 host","time":"2024-04-25T16:08:10Z"}
{"level":"info","msg":"identified concurrent target: 004","time":"2024-04-25T16:08:15Z"}
Be aware this should only be used for testing/debugging purposes. It
can also be used to couple that with the user_enabled_limit
option. This will restrict the use of the feature to projects
that explicitly opt-in. Providing a way to experiment at your
scale without having to manage an additional set of deployments.
unrestricted_cmd_line
Important
We strongly advise only enabling this option when you know
/proc
has been mounted with hidepid
oe else you will increase
the risk of runner generated scripts exposing CI_JOB_TOKEN
via command line.
[general]
unrestricted_cmd_line = true
By default jacamar-auth
takes steps to avoid cases where a job token
could end up in runner defined scripts (e.g., when using Git or managing
artifacts). This includes augmenting the runner generated scripts and
leveraging
GIT_ASKPASS.
Coupled with the Git credentials script we also have to restrict the use
of the credential store
to avoid breaking by incorrectly storing the CI_JOB_TOKEN
.
All this is only required if there exists a chance of a script, the
user cannot control, could expose their job token in /proc
.
By default we do not plan to modify this behavior; however, for
those that have decided
to hide PID listings
you can enable this setting.
One final note, this does not protect user generated scripts/actions, and they should always follow the best practices for your machine when interacting working on multi-tenant resources.
Signal Management
The GitLab custom executor allows for
configurable durations
on timeouts, most importantly the
graceful_kill_timeout
defaults to 10 minutes. This means that once a job is canceled jacamar-auth
will have this time to gracefully terminate whatever processes it is currently
running. However, due to the range of potential configurations relating to
downscoping, jacamar-auth
enforces its own separate timeout on the
jacamar
sub-process:
[general]
kill_timeout = "120s"
The kill_timeout
will start once jacamar-auth
has intercepted a
terminating signal the runner has generated and in turn passes that onto
jacamar
. Only, once this timeout has been encountered will
SIGKILL
be sent.
Important
Never set your runner’s graceful_kill_timeout
configuration below that
of Jacamar CI’s kill_timeout
. In cases where the runner user downscopes
permissions the jacamar-auth
application takes special steps to
ensure that the appropriate signal reaches the sub-process as permissions
will likely prohibit a simple
kill(2).
As a backup, any Jacamar CI application will also establish a self-imposed timeout of the job’s maximum duration plus 10 minutes.
[auth]
- Table
Key |
Description |
---|---|
|
Target downscoping mechanisms for execution of all CI scripts and generated commands through the auth mechanisms. When using |
|
The full path to the Jacamar application, used in constructing the command for job execution. This can be used if it has been installed outside the user’s |
|
The maximum number of characters that can be defined per environment variable (default: |
|
Boolean indicates if the allow/block list rules should be observed prior to the execution of the RunAS validate script. |
|
Indicate via boolean if the privileged Jacamar-Auth user should create the target CI user’s base |
|
An authoritative list of users who can execute CI jobs. |
|
A list of usernames that are not allowed to run CI jobs. More authoritative than group lists, but can be overridden by UserAllowlist. |
|
A list of groups that are allowed to run CI jobs. Least authoritative. |
|
A list of groups that are not allowed to run CI jobs. |
|
If defined, an authoritative list of acceptable shells that for CI users as they are found in the user database. |
|
If defined, an authoritative list of acceptable CI_PIPELINE_SOURCES that can result in local jobs. Value obtained through verified GitLab JWT. |
|
Configurable duration string delay allowed in a JWT’s expiration in select cases to allow for automated cleanup actions (default |
|
Required audience (aud) when validating a JWT. |
|
GitLab managed project bos accounts (i.e., project_{number}_bot) are disallowed by default. |
|
Enforces PR_SET_NO_NEW_PRIVS, to limit the sub-process from gaining additional privileges. Please note that this setting is redundant if seccomp is being used. |
|
List of Run stages that are allowed, all other skipped with a warning to the user. |
|
Indicates that jobs should be blocked during configuration if a pam_nologin <https://man7.org/linux/man-pages/man8/pam_nologin.8.html>_ file ( |
[Auth]
represents authorization process configuration for approving any
GitLab and local accounts. It is observed only and made available to the
jacamar-auth
application. For more details see the
Authorization via Jacamar-Auth documentation.
[auth]
downscope = "setuid"
jacamar_path = "/custom/bin"
max_env_chars = 10000
lists_pre_validation = false
root_dir_creation = true
allow_bot_accounts = false
jwt_exp_delay = "5m"
no_new_privs = false
enforce_nologin = true
user_allowlist = ["usr1"]
user_blocklist = ["usr2", "usr3"]
groups_allowlist = ["grp1", "grp2"]
groups_blocklist = ["grp3"]
shell_allowlist = ["/bin/bash"]
pipeline_source_allowlist = ["push", "web"]
[auth.runas]
- Table
Key |
Description |
---|---|
|
Specify the path to a script where the local user and target service account can be validated. When using RunAs a script is required. |
|
Indicates the name of the CI variable a user can define to indicate their target service account. |
|
Checksum of script, if provided will be verified shortly before execution. |
|
Manages a list of “key=value” strings that dictate additional context to the validation script. These will take lowest priority so avoid using the key for any existing RunAs or system environment variables. |
Configuration of the RunAs portion of the authorization flow can offer administrative control over a transition between the CI user and a local account not known by GitLab. For additional details and workflow consideration see the RunAs authorization.
[auth.runas]
validation_script = "/custom/run-validate.py"
user_variable = "TARGET_SERVICE_USER"
sha256 = "e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84101eb317"
validation_env = ["DEBUG_ENV=1"]
[auth.logging]
- Table
Key |
Description |
---|---|
|
If the system logging for |
|
Identifies where logs will be saved, this can be a distinct file or |
|
Denotes the logging level ( |
|
Used for dialing remote log daemon connections only (e.g., |
|
Used for dialing remote log daemon connections only (e.g., |
Logging represents configuration of how the jacamar-auth
application
(ONLY) will log relevant job level information. This occurs in addition to
any logging preformed by the GitLab runner and assumes that the user account
responsible for launching jacamar-auth
is provided with the
necessary access to the local system log daemon or target file.
[auth.logging]
enabled = true
location = "syslog"
level = "debug"
Note
Incorrectly configured logging will result in CI job failures during the initial configuration stage. Please be sure to test/verify any related configuration changes prior to deployment.
[auth.seccomp]
- Table
Key |
Description |
---|---|
|
Signal if system call filtering via libseccomp should be disabled, this includes all system defined defaults as well as administrative configurations. We advise only disabling if troubleshooting or under specific circumstances where security requirements are not as high. |
|
A list of blocked system calls that the |
|
Globally blocks all system calls from being used, this requires reliance on a manually defined list of |
|
List of system calls that will be allowed, this takes precedence over any manually ( |
|
Sets the default action for allowed system calls to log (audit) while still allowing their execution. This option creates a substantial number of logs and is only suited for dev/test environments. |
|
Disables or prevents the application of PR_SET_NO_NEW_PRIVS based upon the usage of seccomp filters. This only applies when seccomp is enabled. |
|
Modifies the desired block actions and will return an error code rather than terminating the associated thread. |
|
Path to a Go plugin where the filter can be modified. Setting this value implies that plugin support should be enabled |
The jacamar-auth
application by default supports system call filtering
through the libseccomp API. This added functionality can be found in versions
0.5.0+ of Jacamar CI. There are two distinct mechanisms by which specific
syscalls are identified for filtering; administratively defined
configurations and Default Filters established based upon supported
downscoping mechanisms.
Note
Due to the nature of Jacamar CI’s architecture not all potential issues that are present in interactive applications are found here. However, we encourage that if you have concerns or recommendations you create a security issue for the Jacamar CI project.
[auth.seccomp]
disabled = false
block_calls = ["sethostname", "sendfile"]
log_allowed_actions = false
disable_no_new_privs = false
Block All By Default
Note
We do not currently have documented support for the known list of syscalls that must be allowed to support basic application functionality. Please use this for testing purposes only at this time.
The block_all
option establishes a default filtering mechanism that
blocks all syscalls regardless of potential conditions. This
optional configuration will necessitate an administrator
providing a list of allowable calls, otherwise every job will fail.
[auth.seccomp]
block_all = true
allow_calls = ["read", "write", "..."]
Default Filters
The application will attempt to define a meaningful yet limited set of default filters for select syscalls.
Note
Modifications planned for v0.12.0+ removed the remaining default filter, thus disabling seccomp by default for many deployments (see MR 351). For deployments utilizing a standard workflow from systemd/service this will be the equivalent of disabling seccomp with your current deployed version.
Optional Filters
Optionally enabled filters that can be configured in the
[[auth.seccomp]]
table.
Configuration |
Filter Description |
---|---|
|
Block any setuid or setgid call to the non-authorized UID/GID. |
|
Block ioctl in conjunction with |
[auth.seccomp]
limit_setuid = true
tty_rules = true
[batch]
- Table
Key |
Description |
---|---|
|
An array of potential CI variables for user provided arguments in the job submission that are checked in order (default |
|
Meter interactions with schedulers via a duration string (default: |
|
Largest possible delay to expect from NFS servers as a duration string (default: |
|
Path to be observed as a prefix for all scheduler commands generated. Useful when default scheduler application on a user’s |
|
Array of key=value strings that are used when building job submission command (e.g., |
|
Do not cause job failures when a conflicting parameters |
|
Identify that the job status found in the CobaltLog should be skipped in favor of an echo in the output file (Ideally for test/debug purposes only). |
|
Enables the use of |
|
List of arguments that will be injected into the job submission commands. |
Configurations relating exclusively to the support batch scheduling
systems (Cobalt, Flux, LSF, PBS, and Slurm). These will only be observed when
a related executor
is configured.
[batch]
arguments_variable = [
"NEW_SITE_PARAMETERS", "OLD_SITE_PARAMETERS"
]
command_delay = "30s"
nfs_timeout = "1m"
scheduler_bin = "/usr/scheduler/bin"
env_vars = [
"GPU_ENABLED=true", "EXAMPLE_MODE=debug"
]
allow_illegal_args = false
lsf_job_cancellation = false
default_args = ["--clusters=example"]
Feature Flags
Important
These optional configuration are primarily meant for testing/feedback and are subject to modification.
Table |
Key |
Description |
---|---|---|
|
|
Allow users to specify their own |
|
|
Enable secondary check using |
|
|
Improve shell quoting and reliability when generating job submission commands. |
GitLab Runner Config
GitLab has organized documentation covering a number of topics relating to configuring a GitLab runner that we highly recommend you review. Details provided here are focused on aspects that relate directly to Jacamar or other HPC focused administration concerns.
# global
concurrent = 5
# runner specific
[[runners]]
...
pre_clone_script = '''
ml use /example/modules/Core && ml git
'''
output_limit = 10000
executor = "custom"
[runners.custom]
config_exec = "/opt/jacamar/bin/jacamar-auth"
config_args = ["config", "--configuration", "/jacamar.toml"]
...
graceful_kill_timeout = 600
Beyond correctly configuring the custom executor there are other aspects of the runner’s config that are worth closer examination.
concurrent
concurrent = 5
A single config.toml
can define multiple runners that can be registered
with Gitlab. Each appears under a separate [[runners]]
table. Regardless
of the number of registered runners there is an upper limited defined by
concurrent
on the number of total jobs that can be running at any given
time.
In our above example the limit is set to 5 however as an admin this can be altered to best fit the limitations of your CI environment.
Note
At this time there is no recommendation for a concurrent number established for HPC workloads. It may require experimentation but keep in mind that the runner is not a scheduler. As such it will not take into account the availability of local resources when running jobs.
output_limit
output_limit = 10000
The maximum build log size (in kilobytes) is defined on the runner level. Though this functionality exists in the upstream GitLab runner teams with larger build/test process have been likely to experience issues with default settings.
If a user’s output from a CI job exceeds the default limit (4MB)
the job will fail and they will see the following error message:
Job's log exceeded limit of 4194304 bytes
.
Admin Defined Commands
pre_clone_script = '''
module use /example/modules/Core && module load git
'''
The pre_clone_script
is outlined in the
GitLab’s advanced configuration documentation.
However, it can be used to inject administrator defined commands into a user’s
CI job at predefined points. In the above example we are leveraging
LMOD to ensure the required
version of Git is available.
Note
As specified in the requirements Git version 2.9+ needs to be available in order to use the enhanced runner. However, this newer version of Git is only technically required during the get_sources phase of a job. By leveraging the above method you can avoid installing a newer system wide version of Git.
The changes to the environment (module use
pre-append to the
MODULEPATH
) will only be present during this get_sources phase of the
job. Each subsequent phase of the jobs, including when the user defined
scripts are executed, will occur in a clean environment.
In addition to the pre_clone_script
there also exists options
pre_build_script
and post_build_script
that work the same way.