Logging And Metrics

Overview

Log and metrics collection, visualisation, and analysis are a key aspect for operating and maintaining a system. A Vidispine MAM system uses the following components for handling logs and metrics:

  • OpenSearch: store logs and metrics in OpenSearch indices.

  • OpenSearch Dashboards: view and analyse logs and metrics; configure alerts on metrics.

  • fluentd: collection log output from Kubernetes workload.

  • telegraf: collect metrics from Kubernetes workload.

  • telegraf-ds: collect metrics from Kubernetes nodes.

  • telegraf-prometheus-operator: identify the Kubernetes workload exposing metrics and configure telegraf appropriately.

  • Prometheus endpoints: are built into Kubernetes workload to expose metrics in the Prometheus format.

  • Prometheus exporters: sidecar containers on Kubernetes workload which expose metrices in a different format to convert them into the Prometheus format.

As all workload exposes metrics in the Prometheus format, a different toolset may be used to collect metrics from the system. This allows integration of Vidispine metrics into an existing metrics collection system.


Metrics and Logging.png

Log Collection

  • fluentd is running as DaemonSet on all cluster nodes.

  • fluentd is collecting log files from the cluster nodes and writes them to OpenSearch indices.

  • The logrorate DaemonSet is rotating the logfiles on the cluster nodes (optional, can also be done directly on the operating system level of the nodes).

Metrics Collection

  • Each workload comes with a CustomResource of type ServiceMonitor containing information on the Prometheus metrics endpoint of the corresponding workload.

  • The telegraf-prometheus-operator monitors all CustomResources of this type and configures telegraf appropriately.

  • The telegraf deployment scrapes metrics from all configured endpoints and writes them to OpenSearch indices.

  • The telegraf-ds DaemonSet scrapes metrics from all cluster nodes and writes them to OpenSearch indices.