Troubleshooting

This section covers common issues encountered when deploying and using Jacamar CI, along with recommended steps to resolve them. Issues are organized by category, starting with the most common configuration-related problems.

Configuration Errors

The most common errors experienced during initial Jacamar CI deployment will be related to the configuration. For instance, the CI job log presented to the user may only convey this:

../../../_images/config_exec_error.png

This is a fairly unhelpful message for troubleshooting purposes. This occurs because jacamar-auth always treats potentially unknown messages as sensitive and conveys to the user a default message based on limited context.

To help troubleshoot this issue either refer to Obfuscated Error Messages, Enable Syslog Support for Jacamar specific logging or you can always rely on the runner’s journal entries:

$ journalctl -u gitlab-runner
...
WARNING: Error encountered during job: unable to establish Configurer: error executing
    DecodeReader(), Near line 8 (last key parsed 'auth.user_blocklist'): expected a comma
    or array terminator ']', but got '[' instead
    cleanup_std=err job=123456 project=42 runner=Udg3Gxqq

The associated error message will contain not just information to help developers track down the source of a bug but should always contain definitive information on the source of an issue that as an administrator you can fix. In this case: Near line 8 (last key parsed 'auth.user_blocklist')....

Obfuscated Error Messages

By default the majority of errors generated by the privileged jacamar-auth application are obfuscated from the CI user:

../../../_images/autherror_job_log.png

However, if we examine the GitLab-Runner log, we will see a more complete and understandable error message:

WARNING: Error encountered during job: invalid authorization target user: username, is not in the user
  allowlist and is in the user blocklist  cleanup_std=err job=123456 project=42 runner=Udg3Gxqq
WARNING: Preparation failed: exit status 2          job=123456 project=42 runner=Udg3Gxqq
Will be retried in 3s ...                           job=123456 project=42 runner=Udg3Gxqq

The obfuscation is meant to protect administrative configurations (e.g. blocklists), failures in the job environment prior to downscoping, authorization processes from simply conveying the associated messages to users. In many deployments that do not take advantage of RunAs validation scripts, certificate authorities, or do not need to hide details of the configuration then un-obfuscating the error message is completely acceptable and can be accomplished with arguments provided via the GitLab-Runner configuration file (e.g. /etc/gitlab-runner/config.toml):

[runners.custom]
  config_exec = "/opt/jacamar/bin/jacamar-auth"
  config_args = ["--unobfuscated", "config", "/etc/gitlab-runner/custom-config.toml"]

Our stance is that at the risk of sacrificing user experience we will always lean towards more secure defaults.

Enable Syslog Support

The Jacamar-Auth application offers support for writing basic logs relating to the authorization process to syslog. To enable you will need to edit Jacamar’s configuration file:

[auth.logging]
    enabled = true

Once enabled key actions (success and failures) of the authorization flow will be logged:

"invalid authorization target user: username, is not in the user allowlist and is in the user blocklist": {
    "jobid": "123456",
    "runner-short": "jbHXQozi",
    "jacamar-name": "Test Runner",
    "ci-stage": "cleanup_exec",
    "hostname": "host"
}

Use of this functionality is in addition to the GitLab Runner’s supported system logging functionality. Using both logs jointly provides the most accurate picture possible; however, Jacamar’s built-in logging will always remain optional.

Three Identical Failures?

Using the runner you will undoubtedly encounter times when error messages (both in the CI job log as well as system log) are repeated three times. This will only occur when a runner system error is encountered during configuration of the job environment (config_exec), prior to any user influenced scripts being executed:

../../../_images/prepare_exec_failures.png

We are unable to offer a solution for this currently. It is simply an uncontrollable aspect of the upstream runner.

Seccomp Filters

Identifying issues related to any configuration or default filters can be difficult as different applications handle such failures in a variety of ways:

$ python3 -m venv $CI_PROJECT_DIR/env && source $CI_PROJECT_DIR/env/bin/activate
Error: Command '['/builds/ecp-ci/ci-scratch-space/env/bin/python3', '-Im', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1.

To assist in this process we highly recommend reviewing existing documentation for recent changes that may affect jobs. It is also valuable to leverage strace to examine for errors relating to the deployments configuration. For instance in this example we’ve defined block_calls = ["ioctl"]:

$ strace python3 -m venv ${CI_PROJECT_DIR}/env
execve("/usr/bin/python3", ["python3", "-m", "venv", "/home/user"...],
...
ioctl(0, TCGETS, 0x7ffc5bf9a5f0)        = -1 ENOTTY (Inappropriate ioctl for device)
...

Note

If you choose to use the disable = true configuration please ensure that you are focusing testing on trusted projects/workflows only.

If you encounter blocking issues related to default filters or would like to propose specific conditions by which filter can be limited/extended please create a security issue on the Jacamar CI project and we can attempt to assist.

Required Git Version

A user’s job is experiencing the following error message while attempting automated Git interactions:

Fetching changes with git depth set to 50...
Reinitialized existing Git repository in /data_dir/user/group/project/.git/
git: 'credential-' is not a git command. See 'git --help'.
Did you mean this?
  credential
git: 'credential-' is not a git command. See 'git --help'.
Did you mean this?
  credential

This is likely due to an older version of Git (< 2.9) being used by the user. Keep in mind that runner defined Git operations (e.g., fetch the latest project repository) are done automatically on the user’s behalf and rely upon whatever git application is found on their PATH. This can be influenced by the configuration of the GitLab-Runner (/etc/gitlab-runner/config.toml) by declaring the usage of a newer version via system modules:

[[runners]]
  ...
  pre_clone_script = '''
    module use /example/modules/Core && module load git/2.31.1
  '''

Please note, we do not recommend any specific version, only that it is newer than 2.9.

Unable to retrieve key from JWKS

A job fails early in configuration, unable to retrieve keys from the JSON Web Key Store based upon the known server URL. Please note the below error is being shown un-obfuscated and would normally only be visible in the log files.

Preparing the "custom" executor
Error encountered during job: unable to parse supplied CI_JOB_JWT: unable to parse supplied CI_JOB_JWT: unable
  to retrieve key from JWKS endpoint: unable to retrieve response from https://gitlab.example.com/-/jwks: Get
  "https://gitlab.example.com/-/jwks": dial tcp: no such host
ERROR: Preparation failed: exit status 2

The likely cause of this issue is related to the server URL known by the runner through the job response. This may not align with the requirements for your deployment (e.g. subdomain, path, or port). This can be set using the Jacamar CI configuration:

[general]
  ...
  gitlab_server = "https://example.com/gitlab"

Validation for CIJobJWT failed on the ‘jwt’ tag

Troubleshooting an error relating to the validation of the CIJobJWT is normally caused by a missing id_token/CI_JOB_JWT. New versions of Jacamar CI will further clarify this error to users:

No id_token found on EXAMPLE_ID_TOKEN variable. Please update your CI job to include the following:
  id_tokens:
    EXAMPLE_ID_TOKEN:
      aud: https://gitlab.example.com
Prior to server release v16.0 the default CI_JOB_JWT will remain available. For additional details
  see: https://docs.gitlab.com/ee/ci/yaml/index.html#id_tokens

The only solution to this is to have users add the following to their CI/CD jobs that will be run by a Jacamar CI application:

job:
  id_tokens:
    EXAMPLE_ID_TOKEN:
      aud: https://code.ornl.gov

See the Migrating to new id_tokens from CI_JOB_JWT for additional details.