Observability

This page describes how to monitor the Gameserver with dashboards, metrics, and logs.

Built-In Dashboards

CTF Gameserver's Web component includes several dashboards for service authors to observe the behavior of their Checker Scripts:

  • Service History shows the check results for all teams over multiple ticks. It provides a nice overview of the Checker Script's behavior at large.
  • Missing Checks lists checks with the "Not checked" status. These are particularly interesting because they point at crashes or timeouts.

Access to all of these requires a user account with "Staff" status. If configured, links to the corresponding logs in Graylog are automatically generated (setting GRAYLOG_SEARCH_URL).

Users with "Staff" status can also view the VPN Status History dashboard of any selected team.

Logging

All components write logs to stdout, from where they are usually picked up by systemd. You can view them through the regular journald facilities (journalctl).

The only exception to this are Checker Script logs, which can be very verbose and should be accessible to their individual authors.

Checkers

You must explicitly configure Checker Script logs to be sent to either journald or Graylog.

Graylog (Open) is the recommended option, especially for larger competitions. It allows logs to be accessed through a web interface and filtered by service, team, tick, etc. When GRAYLOG_SEARCH_URL is configured for the Web component, the built-in dashboards automatically generate links to the respective logs.

After installing Graylog, create a new "GELF UDP" input through the web interface with a large enough recv_buffer_size (we use 2 MB, i.e. 2097152 bytes). The parameters of this input then get used in the CTF_GELF_SERVER option.

With journald-based Checker logging, you can filter log entries like this:

journalctl -u ctf-checkermaster@service.service SYSLOG_IDENTIFIER=checker_service-team023-tick042

Additionally, the ctf-logviewer script is available. It is designed to be used as an SSH ForceCommand to give service authors access to logs for a specific service.

Metrics

All components except Web can expose metrics in Prometheus format. Prometheus enables both alerting and dashboarding with Grafana.

To enable metrics, configure CTF_METRICS_LISTEN (the Ansible roles do that by default). For the available metrics and their description, manually request the metrics via HTTP.