Troubleshooting

These are a variety of common issues and steps we can recommend to correct them.

Configuration Errors

The most common errors experienced during initial Jacamar CI deployment will be related to the configuration. For instance the CI job log presented to the user may only convey this:

In terms of troubleshooting, a fairly unhelpful message. This is due to the fact jacamar-auth will always treat a potentially unknown message as sensitive and convey to the user a default based upon limited context.

To help troubleshoot this issue either refer to Obfuscated Error Messages, Enable Syslog Support for Jacamar specific logging or you can always rely on the runner’s journal entries:

$ journalctl -u gitlab-runner
...
WARNING: Error encountered during job: unable to establish Configurer: error executing
    DecodeReader(), Near line 8 (last key parsed 'auth.user_blocklist'): expected a comma
    or array terminator ']', but got '[' instead
    cleanup_std=err job=123456 project=42 runner=Udg3Gxqq

The associated error message will contain not just information to help developers track down the source of a bug but should always contain definitive information on the source of an issue that as an administrator you can fix. In this case: Near line 8 (last key parsed 'auth.user_blocklist')....

Obfuscated Error Messages

By default the majority of errors generated by the privileged jacamar-auth application are obfuscated from the CI user:

However, if we examine the GitLab-Runner log then we will see a more complete and understandable error message;

WARNING: Error encountered during job: invalid authorization target user: username, is not in the user allowlist and is in the user blocklist  cleanup_std=err job=123456 project=42 runner=Udg3Gxqq
WARNING: Preparation failed: exit status 2          job=123456 project=42 runner=Udg3Gxqq
Will be retried in 3s ...                           job=123456 project=42 runner=Udg3Gxqq

The obfuscation is meant to protect administrative configurations (e.g. blocklists), failures in the job environment prior to downscoping, authorization processes from simply conveying the associated messages to a user. In many deployments that do not take advantage of RunAs validation scripts, certificate authorities, or do not need to hide details of the configuration then un-obfuscating the error message is completely acceptable and can be accomplished with arguments provided via the GitLab-Runner configuration file (e.g. /etc/gitlab-runner/config.toml):

[runners.custom]
  config_exec = "/opt/jacamar/bin/jacamar-auth"
  config_args = ["--unobfuscated", "config", "/etc/gitlab-runner/custom-config.toml"]

Our stance is that at the risk of sacrificing user experience we will always lean towards more secure defaults.

Cleanup Stage Configuration

Note

As of release 0.4.0 this is a required configuration, though the reasoning behind this remains the same.

For the most part the stages in a custom executor have little to no method of sharing information between one another. This presents the potential for rather confusing runner system logs when a job fails during the config stage but during the cleanup stage logs a completely different issue.

This is caused by the fact we can only preserve state between stages if the config is completed successfully and we are able to register stateful variables (including elements of the configuration). When this state is not made available to subsequent stages Jacamar will fail due to an inability to identify the required job context.

By specifying the --configuration argument in both config_args as well as cleanup_args the specified file will be observed if there are no established stateful environment variables provided. By doing this you can have clearer runner logs and potentially ease troubleshooting, as well as, auditing efforts.

[[runners]]
    ...
    [runners.custom]
        config_exec = "/opt/jacamar/bin/jacamar-auth"
        config_args = ["config", "--configuration", "/etc/gitlab-runner/custom-config.toml"]
        ...
        cleanup_exec = "/opt/jacamar/bin/jacamar-auth"
        cleanup_args = ["cleanup", "--configuration", "/etc/gitlab-runner/custom-config.toml"]

Enable Syslog Support

The Jacamar-Auth application offers support for writing basic logs relating to the authorization process to syslog. To enable you will need to edit Jacamar’s configuration file:

[auth.logging]
    enabled = true

Once enabled key actions (success and failures) of the authorization flow will be logged:

"invalid authorization target user: username, is not in the user allowlist and is in the user blocklist": {
    "jobid": "123456",
    "runner-short": "jbHXQozi",
    "jacamar-name": "Test Runner",
    "ci-stage": "cleanup_exec",
    "hostname": "host"
}

Use of this functionality is in addition to the GitLab Runner’s supported system logging functionally. Use of both logs jointly provide the most accurate picture possible; however, Jacamar’s built-in logging will always remain optional.

Three Identical Failures?

Using the runner you will undoubtedly encounter times when error messages (both in the CI job log as well as system log) are repeated three times. This will only occur when an runner system error is encountered during configuration of the job environment (config_exec), prior to any user influenced scripts being executed:

../../../_images/prepare_exec_failures.png

We are unable to offer a solution for this currently. It is simply an uncontrollable aspect of the upstream runner.

Seccomp Filters

Identifying issues related to any configuration or default filters can be difficult as different applications handle such failures in a variety of ways:

$ python3 -m venv $CI_PROJECT_DIR/env && source $CI_PROJECT_DIR/env/bin/activate
Error: Command '['/builds/ecp-ci/ci-scratch-space/env/bin/python3', '-Im', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1.

To assist in this process we highly recommend reviewing existing documentation for recent changes that may affect jobs. It is also valuable to leverage strace to examine for errors relating to the deployments configuration. For instance in this example we’ve defined block_calls = ["ioctl"]:

$ strace python3 -m venv ${CI_PROJECT_DIR}/env
execve("/usr/bin/python3", ["python3", "-m", "venv", "/home/user"...],
...
ioctl(0, TCGETS, 0x7ffc5bf9a5f0)        = -1 ENOTTY (Inappropriate ioctl for device)
...

Note

If you choose to use the disable = true configuration please ensure that you are focusing testing on trusted projects/workflows only.

If you encounter blocking issues related to default filters or would like to propose specific conditions by which filter can be limited/extended please create a security issue on the Jacamar CI and we can attempt to assist.

Required Git Version

A user’s job is experiencing the following error message while attempting automated Git interactions:

Fetching changes with git depth set to 50...
Reinitialized existing Git repository in /data_dir/user/group/project/.git/
git: 'credential-' is not a git command. See 'git --help'.
Did you mean this?
  credential
git: 'credential-' is not a git command. See 'git --help'.
Did you mean this?
  credential

This is likely due to an older version of Git (< 2.9) being used by the user. Keep in mind that runner defined Git operations (e.g., fetch the latest project repository) are done automatically on the user’s behalf and rely upon whatever git application is found on their PATH. This can be influenced by the configuration of the GitLab-Runner (/etc/gitlab-runner/config.toml) by declaring the usage of a newer version via system modules:

[[runners]]
  ...
  pre_clone_script = '''
    module use /example/modules/Core && module load git/2.31.1
  '''

Please note, we do not recommend any specific version, only that it is newer than 2.9.

Unable to retrieve key from JWKS

A job fails early in configuration, unable to retrieve keys from the JSON Web Key Store based upon the known server URL. Please note the below error is being shown un-obfuscated and would normally only be visible in the log files.

Preparing the "custom" executor
Error encountered during job: unable to parse supplied CI_JOB_JWT: unable to parse supplied CI_JOB_JWT: unable to retrieve key from JWKS endpoint: unable to retrieve response from https://gitlab.example.com/-/jwks: Get "https://gitlab.example.com/-/jwks": dial tcp: no such host
ERROR: Preparation failed: exit status 2

The likely cause of this issue is related to the server URL known by the runner through the job response. This may not align with the requirements for your deployment (e.g. subdomain, path, or port). This can be set using the Jacamar CI configuration:

[general]
  ...
  gitlab_server = "https://example.com/gitlab"

Validation for CIJobJWT failed on the ‘jwt’ tag

Troubleshooting an error relating to the validation of the CIJobJWT is normally caused by a missing id_token/CI_JOB_JWT. New versions of Jacamar CI will further clarify this error to users:

No id_token found on EXAMPLE_ID_TOKEN variable. Please update your CI job to include the following:
  id_tokens:
    EXAMPLE_ID_TOKEN:
      aud: https://gitlab.example.com
Prior to server release v16.0 the default CI_JOB_JWT will remain available. For additional details see: https://docs.gitlab.com/ee/ci/yaml/index.html#id_tokens

The only solution to this is to have users add the following to their CI/CD jobs that will be run by a Jacamar CI application:

job:
  id_tokens:
    EXAMPLE_ID_TOKEN:
      aud: https://code.ornl.gov

See the Migrating to new id_tokens from CI_JOB_JWT for additional details.