Introduction to Local System Logging

Being able to trace jobs from a local system back to the responsible actions/users found on the server is important for supporting CI/CD on HPC test resources. To assist in that process this guide has been written to:

Highlight application configurations
Identify where logs are stored
Provide examples to help with decoding events

We assume you already have an understanding of these systems and test resources deployed to support the documented workflows.

Configurations

Important

Please verify that you’ve configured all applications and ensured functionality prior to opening up access to additional users and workflows.

Jacamar CI

Specify the following within your configuration:

[auth.logging]
  enabled = true
  level = "debug"

You can use a local filepath for your location (sent to syslog by default) or configure network/address settings for logging to a remote server.

Note

Logging by the jacamar-auth application is optional, though highly encouraged to provide additional details not possible due to the pre-defined interactions within the custom executor model.

GitLab Runner

Having the runner write to syslog is easily accomplished with the --syslog parameter used in a gitlab-runner run command. This logging can be further influenced in the config.toml with the advanced configuration:

concurrent = 10
log_format = "json"
log_level = "info"

Logging Sources/Destinations

For both applications we advise the usage of syslog. Though for our examples this is /var/log/messages, it can differ based upon your deployment. Please consult your system documentation.

Important

This guide will only focus on logging as it relates directly to Jacamar CI. Any details regarding the server or other applications are best referenced by the associated documentation.

Tracing a Specific Job

Jobs are first identified by the GitLab Runner service through a process of polling the server for availability. Once a job is identified and response obtained, it’s then the runner’s responsibility to begin the local execution using the provided context.

jacamar-ci-runner[1000]: {
    "job": 123456,
    "level": "info",
    "msg": "Checking for jobs... received",
    "repo_url": "https://gitlab.example.com/group/project.git",
    "runner": "NJyVIsfB",
    "time": "2021-08-02T16:14:13-04:00"
}

If we examine the logs associated with the runner service (jacamar-ci-runner) we should notice several key details:

job - Unique CI Job identification number provided by the server.
runner - Unique shortened runner token that can be used to identify the specific runner and its configuration.

Since the Job ID is guaranteed unique from the server, this will be the most efficient way to link a local process back to the server for additional context.

Successful Job

Let’s examine a job that has been started successfully, showing the configuration file being opened when launched:

jacamar-auth[5001]: {
    "jacamar-name": "logging test runner",
    "job": 123456,
    "msg": "configuration file (/home/gitlab-runner/.gitlab-runner/custom-config.toml) opened",
    "runner": "NJyVIsfB",
    "stage": "config_exec",
    "level": "info",
    "processID": 5001,
    "hostname": "localhost.example",
    "time": "2021-08-02T16:35:25-04:00"
}

Each entry in the log will specify a separate "stage", these will equate to either the custom executor stages or run_exec specific sub-stages.

jacamar-auth[5005]: {
    "job": 123456,
    "msg": "JWT verified (GitLab login: tester)",
    "runner": "NJyVIsfB",
    "stage": "get_sources",
    ...
}

Note

Since the jacamar application is assumed to run in userspace, it is not designed to leverage system logging as we do with jacamar-auth. Instead all stdout/stderr from this process will be written to the job log and can be reviewed there.

The final job status is logged by the runner along with the duration:

jacamar-ci-runner[1000]: {
    "duration_s": 0.962440272,
    "job": 123456,
    "msg": "Job succeeded",
    ...
}

Failed Job

Next let’s examine a job that has failed due to an authorization error:

jacamar-auth[6000]: {
    "job": 123457,
    "level": "info",
    "msg": "JWT verified (GitLab login: tester)",
    ...
}
jacamar-auth[6000]: {
    "job": 123457,
    "level": "error",
    "msg": "failed to authorize user for CI job: invalid authorization target user: tester, is not in the user allowlist and is in the user blocklist",
    ...
}

Unless otherwise configured (see: obfuscated error messages ) the specific message for most errors originating via the jacamar-auth application are only visible in the log. This has been done to prevent accidentally conveying sensitive configurations/details by default regardless of the specific deployment.

jacamar-ci-runner[1000]: {
    "job": 123457,
    "level": "warning",
    "msg": "Error encountered during job: failed to authorize user for CI job: invalid authorization target user: tester, is not in the user allowlist and is in the user blocklist",
    ...
}

Each time jacamar-auth is launched, it relies upon the minimal state provided by the runner to gather context. As such, error messages can appear duplicated between both logging mechanisms across distinct stages (i.e., config_exec and cleanup_exec).

Note

In cases where conflicting error messages are encountered, it is likely due to failures during config_exec that prevented the establishment of stateful variables. Rely upon the earliest identified error to discover the root cause.

In the same way as a successful job, the runner will log the final status in accordance with the custom executor’s error handling.

jacamar-ci-runner[1000]: {
    "duration_s": 9.329694821,
    "job": 123457,
    "level": "error",
    "msg": "Job failed (system failure): exit status 2",
    ...
}

You may notice that this failed job lasted 9 seconds longer than the successful one. This is due to the custom executors handling of failures during preparations. A failure (like in our case) may result in three attempts with a three second wait in between each subsequent attempt.