Logging And Metrics [C ARC]

Overview

Log and metrics collection, visualisation, and analysis are a key aspect for operating and maintaining a system. A Vidispine MAM system uses the following components for handling logs and metrics:

OpenSearch: store logs and metrics in OpenSearch indices.
OpenSearch Dashboards: view and analyse logs and metrics; configure alerts on metrics.
fluentd: collection log output from Kubernetes workload.
telegraf: collect metrics from Kubernetes workload.
telegraf-ds: collect metrics from Kubernetes nodes.
telegraf-prometheus-operator: identify the Kubernetes workload exposing metrics and configure telegraf appropriately.
Prometheus endpoints: are built into Kubernetes workload to expose metrics in the Prometheus format.
Prometheus exporters: sidecar containers on Kubernetes workload which expose metrices in a different format to convert them into the Prometheus format.

As all workload exposes metrics in the Prometheus format, a different toolset may be used to collect metrics from the system. This allows integration of Vidispine metrics into an existing metrics collection system.

Log Collection

fluentd is running as DaemonSet on all cluster nodes.
fluentd is collecting log files from the cluster nodes and writes them to OpenSearch indices.
The logrorate DaemonSet is rotating the logfiles on the cluster nodes (optional, can also be done directly on the operating system level of the nodes).

Metrics Collection

Each workload comes with a CustomResource of type ServiceMonitor containing information on the Prometheus metrics endpoint of the corresponding workload.
The telegraf-prometheus-operator monitors all CustomResources of this type and configures telegraf appropriately.
The telegraf deployment scrapes metrics from all configured endpoints and writes them to OpenSearch indices.
The telegraf-ds DaemonSet scrapes metrics from all cluster nodes and writes them to OpenSearch indices.