Troubleshooting
These are a variety of common issues and steps we can recommend to correct them.
Configuration Errors
The most common errors experienced during initial Jacamar CI deployment will be related to the configuration. For instance the CI job log presented to the user may only convey this:
In terms of troubleshooting, a fairly unhelpful message. This is due to the
fact jacamar-auth
will always treat a potentially unknown message as
sensitive and convey to the user a default based upon limited context.
To help troubleshoot this issue either refer to Obfuscated Error Messages, Enable Syslog Support for Jacamar specific logging or you can always rely on the runner’s journal entries:
$ journalctl -u gitlab-runner
...
WARNING: Error encountered during job: unable to establish Configurer: error executing
DecodeReader(), Near line 8 (last key parsed 'auth.user_blocklist'): expected a comma
or array terminator ']', but got '[' instead
cleanup_std=err job=123456 project=42 runner=Udg3Gxqq
The associated error message with contain not just information to help
developers track down the source of a bug but should always contains
definitive information on the source of an issue that as an administrator
you can fix. In this case: Near line 8 (last key parsed 'auth.user_blocklist')...
.
Obfuscated Error Messages
By default the majority of errors generated by the privileged jacamar-auth
application are obfuscated from the CI user:
However, if we examine the GitLab-Runner log then we will see a more complete and understandable error message;
WARNING: Error encountered during job: invalid authorization target user: username, is not in the user allowlist and is in the user blocklist cleanup_std=err job=123456 project=42 runner=Udg3Gxqq
WARNING: Preparation failed: exit status 2 job=123456 project=42 runner=Udg3Gxqq
Will be retried in 3s ... job=123456 project=42 runner=Udg3Gxqq
The obfuscation is meant to protect administrative configurations
(e.g. blocklists), failures in the job environment prior to downscoping,
authorization processes from simply conveying the associated messages to
a user. In many deployments that do not take advantage of RunAs validation
scripts, certificate authorities, or do not need to hide details of the
configuration than un-obfuscating the error message is completely acceptable
and can be accomplished with arguments provided via the GitLab-Runner
configuration file (e.g. /etc/gitlab-runner/config.toml
):
[runners.custom]
config_exec = "/opt/jacamar/bin/jacamar-auth"
config_args = ["--unobfuscated", "config", ...]
...
Our stance is that at the risk of sacrificing user experience we will always lean towards more secure defaults.
Cleanup Stage Configuration
Note
As of release 0.4.0
this is a required configuration, though the
reasoning behind this remains the same.
For the most part the stages in a custom executor have little to no method of sharing information between one another. This presents the potential for rather confusing runner system logs when a job fails during the config stage but during the cleanup stage logs a completely different issue.
This is caused by the fact we can only preserve state between stages if the config is completed successfully and we are able to register stateful variables (including elements of the configuration). When this state is not made available to subsequent stages Jacamar will fail due to an inability to identify the required job context.
By specifying the --configuration
argument in both config_args
as well as cleanup_args
the specified file will be observed if
there are no established stateful environment variable provided. By doing this
you can have clearer runner logs and potentially ease troubleshooting, as well
as, auditing efforts.
[[runners]]
...
[runners.custom]
config_exec = "/opt/jacamar/bin/jacamar-auth"
config_args = ["config", "--configuration", "/etc/gitlab-runner/custom-config.toml"]
...
cleanup_exec = "/opt/jacamar/bin/jacamar-auth"
cleanup_args = ["cleanup", "--configuration", "/etc/gitlab-runner/custom-config.toml"]
Enable Syslog Support
The Jacamar-Auth application offers support for writing basic logs relating to the authorization process to syslog. To enable you will need to edit Jacamar’s configuration file:
[auth.logging]
enabled = true
Once enabled key actions (success and failures) of the authorization flow will be logged:
"invalid authorization target user: username, is not in the user allowlist and is in the user blocklist": {
"jobid": "123456",
"runner-short": "jbHXQozi",
"jacamar-name": "Test Runner",
"ci-stage": "cleanup_exec",
"hostname": "host"
}
Use of this functionality is in addition to the GitLab Runner’s supported system logging functionally. Use of both logs jointly provide the most accurate picture possible; however, Jacamar’s built-in logging will always remain optional.
Three Identical Failures?
Using the runner you will undoubtedly encounter times when error messages (both
in the CI job log as well as system log) are repeated three times. This will
only occur when an runner system error is encountered during configuration
of the job environment (config_exec
), prior to any user influenced
scripts being executed:
We are unable to offer a solution for this currently. It is simply an un-controllable aspects of the upstream runner.
Seccomp Filters
Identifying issues related to any configuration or default filters can be difficult as different applications handle such failures in a variety of ways:
$ python3 -m venv $CI_PROJECT_DIR/env && source $CI_PROJECT_DIR/env/bin/activate
Error: Command '['/builds/ecp-ci/ci-scratch-space/env/bin/python3', '-Im', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1.
To assist in this process we highly recommend reviewing existing documentation
for recent changes that may be affect jobs. It is also valuable to leverage
strace to examine for
errors relating to the deployments configuration. For instance in this example
we’ve defined block_calls = ["ioctl"]
:
$ strace python3 -m venv ${CI_PROJECT_DIR}/env
execve("/usr/bin/python3", ["python3", "-m", "venv", "/home/user"...],
...
ioctl(0, TCGETS, 0x7ffc5bf9a5f0) = -1 ENOTTY (Inappropriate ioctl for device)
...
Note
If you choose to use the disable = true
configuration please ensure that
you are focusing testing on trusted projects/workflows only.
If you encounter blocking issues related to default filters or would like to propose specific conditions by which filter can be limited/extended please create a security issue on the Jacamar CI and we can attempt to assist.
Required Git Version
A user’s job is experiencing the following error message while attempting automated Git interactions:
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in /data_dir/user/group/project/.git/
git: 'credential-' is not a git command. See 'git --help'.
Did you mean this?
credential
git: 'credential-' is not a git command. See 'git --help'.
Did you mean this?
credential
This is likely due to an older version of Git (< 2.9
) being used by the
user. Keep in mind that runner defined Git operations (e.g., fetch
the latest project repository) are done automatically on the user’s behalf and
rely upon whatever git
application is found on their PATH
. This
can be influences by the configuration of the GitLab-Runner
(/etc/gitlab-runner/config.toml
) by declaring the usage of a newer version
via system modules:
[[runners]]
...
pre_clone_script = '''
module use /example/modules/Core && module load git/2.31.1
'''
Please note, we do not recommend any specific version, only that is is newer
than 2.9
.
Unable to retrieve key from JWKS
A job fails early in configuration, unable to retrieve keys from the JSON Web Key Store based upon the known server URL. Please note the below error is being shown un-obfuscated and would normally only be visible in the log files.
Preparing the "custom" executor
Error encountered during job: unable to parse supplied CI_JOB_JWT: unable to parse supplied CI_JOB_JWT: unable to retrieve key from JWKS endpoint: unable to retrieve response from https://gitlab.example.com/-/jwks: Get "https://gitlab.example.com/-/jwks": dial tcp: no such host
ERROR: Preparation failed: exit status 2
The likely cause of this issue is related to the server URL known by the runner through the job response. This may not aligns with the requirements for your deployment (e.g. subdomain, path, or port). This can be set using the Jacamar CI configuration:
[general]
...
gitlab_server = "https://example.com/gitlab"
Validation for CIJobJWT failed on the ‘jwt’ tag
Troubleshooting an error relating to the validation of the CIJobJWT
is
normally caused by a missing id_token
/CI_JOB_JWT
. New versions of
Jacamar CI will further clarify this error to users:
No id_token found on EXAMPLE_ID_TOKEN variable. Please update your CI job to include the following:
id_tokens:
EXAMPLE_ID_TOKEN:
aud: https://gitlab.example.com
Prior to server release v16.0 the default CI_JOB_JWT will remain available. For additional details see: https://docs.gitlab.com/ee/ci/yaml/index.html#id_tokens
The only solution to this is to have users add the following to their CI/CD jobs that will be run by a Jacamar CI application:
job:
id_tokens:
EXAMPLE_ID_TOKEN:
aud: https://code.ornl.gov
See the Migrating to new id_tokens from CI_JOB_JWT for additional details.