Introduction to Local System Logging
Being able to trace jobs from a local system back to the responsible actions/users found one the server is important to supporting CI/CD on HPC test resources. To assist in that process this guide has been written to:
Highlight application configurations
Identify where logs are stored
Provide examples to help with decoding events
We assume you already have an understanding of these systems and test resources deployed to support the documented workflows.
Configurations
Important
Please verify that you’ve configured all applications and ensured functionality prior to opening up access to additional users and workflows.
Jacamar CI
Specify the following within your configuration:
[auth.logging]
enabled = true
level = "debug"
It is possible to use a local filepath for your location
(sent to
syslog by default) or a network
/address
setting if
logging to a remote server.
Note
Logging by the jacamar-auth
application is optional, though highly
encouraged to provide additional details not possible due to the pre-defined
interactions within the
custom executor model.
GitLab Runner
Having the runner write to syslog is easily accomplished with the
--syslog
parameter user in a
gitlab-runner run
command. This logging can be further influenced through in the
config.toml
with the
advanced configuration:
concurrent = 10
log_format = "json"
log_level = "info"
Logging Sources/Destinations
For both applications we advise the usage of
syslog. Though
for our examples this is /var/log/messages
, it can differ based upon
your deployment. Please consult your system documentation.
Important
This guide will only focus on logging as it relates directly to Jacamar CI. Any details regarding the server or other applications are best referenced by the associated documentation.
Tracing a Specific Job
Jobs are first identified by the GitLab Runner service through a process of polling the server for availability. Once a job is identified and response obtained, it’s then the responsibility of the runner to begin the local execution using the provided context.
jacamar-ci-runner[1000]: {
"job": 123456,
"level": "info",
"msg": "Checking for jobs... received",
"repo_url": "https://gitlab.example.com/group/project.git",
"runner": "NJyVIsfB",
"time": "2021-08-02T16:14:13-04:00"
}
If we examine the logs associated with the runner service
(jacamar-ci-runner
) we should notice several key details:
job
- Unique CI Job identification number provided by the server.runner
- Unique shortened runner token that can be used to identify the specific runner and its configuration.
Since the Job ID is guaranteed unique form the server, this will be the most efficient way to link a local process back to the server for additional context.
Successful Job
Lets examine a job that has been started successfully, opening the command line specific configuration file when launched:
jacamar-auth[5001]: {
"jacamar-name": "logging test runner",
"job": 123456,
"msg": "configuration file (/home/gitlab-runner/.gitlab-runner/custom-config.toml) opened",
"runner": "NJyVIsfB",
"stage": "config_exec",
"level": "info",
"processID": 5001,
"hostname": "localhost.example",
"time": "2021-08-02T16:35:25-04:00"
}
Each entry in in the log will specify a separate "stage"
, these
will equate to either the custom executor
stages
or run_exec
specific
sub-stages.
jacamar-auth[5005]: {
"job": 123456,
"msg": "JWT verified (GitLab login: tester)",
"runner": "NJyVIsfB",
"stage": "get_sources",
...
}
Note
Since the jacamar
application is assumed to run in userspace, it
is not designed to leverage system logging as we do with jacamar-auth
.
Instead all stdout/stderr from this process will be written to the
job log and
can be reviewed there.
The final job status is logged by the runner along with the duration:
jacamar-ci-runner[1000]: {
"duration_s": 0.962440272,
"job": 123456,
"msg": "Job succeeded",
...
}
Failed Job
Next let’s examine a job that has failed due to an authorization error:
jacamar-auth[6000]: {
"job": 123457,
"level": "info",
"msg": "JWT verified (GitLab login: tester)",
...
}
jacamar-auth[6000]: {
"job": 123457,
"level": "error",
"msg": "failed to authorize user for CI job: invalid authorization target user: tester, is not in the user allowlist and is in the user blocklist",
...
}
Unless otherwise configured (see:
obfuscated error messages
) the specific message for most errors originating via the jacamar-auth
application are only visible in the log. This has been done to prevent
accidentally conveying sensitive configurations/details by default
regardless of the specific deployment.
jacamar-ci-runner[1000]: {
"job": 123457,
"level": "warning",
"msg": "Error encountered during job: failed to authorize user for CI job: invalid authorization target user: tester, is not in the user allowlist and is in the user blocklist",
...
}
Each time jacamar-auth
is launched, it is done so relying upon the minimal
state provided by the runner to gather context. As such, error messages can
appear duplicated between both logging mechanisms across distinct stages
(i.e., config_exec
and cleanup_exec
).
Note
In cases where conflicting error messages are encountered, it is likely
due to failures during config_exec
that prevented the establishment
of stateful variables. Rely upon the earliest identified error to discover
the root cause.
In the same way as a successful job, the runner will log the final status in accordance with the custom executor’s error handling.
jacamar-ci-runner[1000]: {
"duration_s": 9.329694821,
"job": 123457,
"level": "error",
"msg": "Job failed (system failure): exit status 2",
...
}
You may notice that this failed jobs lasted 9 seconds longer than the successful one. This is due the custom executors handling of failures during preparations. A failure (like in our case) may result in three attempts with a three second wait in between each subsequent attempt.