Logging And Metrics
Overview
Log and metrics collection, visualisation, and analysis are a key aspect for operating and maintaining a system. A Vidispine MAM system uses the following components for handling logs and metrics:
OpenSearch: store logs and metrics in OpenSearch indices.
OpenSearch Dashboards: view and analyse logs and metrics; configure alerts on metrics.
fluentd: collection log output from Kubernetes workload.
telegraf: collect metrics from Kubernetes workload.
telegraf-ds: collect metrics from Kubernetes nodes.
telegraf-prometheus-operator: identify the Kubernetes workload exposing metrics and configure telegraf appropriately.
Prometheus endpoints: are built into Kubernetes workload to expose metrics in the Prometheus format.
Prometheus exporters: sidecar containers on Kubernetes workload which expose metrices in a different format to convert them into the Prometheus format.
As all workload exposes metrics in the Prometheus format, a different toolset may be used to collect metrics from the system. This allows integration of Vidispine metrics into an existing metrics collection system.
Log Collection
fluentd
is running as DaemonSet on all cluster nodes.fluentd
is collecting log files from the cluster nodes and writes them to OpenSearch indices.The
logrorate
DaemonSet is rotating the logfiles on the cluster nodes (optional, can also be done directly on the operating system level of the nodes).
Metrics Collection
Each workload comes with a CustomResource of type
ServiceMonitor
containing information on the Prometheus metrics endpoint of the corresponding workload.The
telegraf-prometheus-operator
monitors all CustomResources of this type and configurestelegraf
appropriately.The
telegraf
deployment scrapes metrics from all configured endpoints and writes them to OpenSearch indices.The
telegraf-ds
DaemonSet scrapes metrics from all cluster nodes and writes them to OpenSearch indices.