Log Management and Distributed Tracing using Grafana Loki and Tempo
With observability, collecting the measurements of logs, metrics, and distributed traces are the three key pillars to achieving success.
Logs: These are structured or unstructured text records of discreet events that occurred at a specific time.
Metrics: These are the values represented as counts or measures that are often calculated or aggregated over a period of time. Metrics can originate from a variety of sources, including infrastructure, hosts, services, cloud platforms, and external sources.
Distributed tracing: This displays activity of a transaction or request as it flows through applications and shows how services connect, including code-level details.
Logs and Distributed tracing are important aspect of any Observability platform. This blog explains instructions to setup Log management and distributed tracing using popular Open source observability platform Grafana and Grafana plugins Grafana LOKI and Grafana Tempo
Before we move into setup, let us look at the basics of Grafana, Grafana LOKI and Grafana Tempo
Grafana
Grafana is a popular open-source interactive visualisation platform which allows users to see their data via charts and graphs that are unified into one dashboard for easier interpretation and understanding. We can also query and set alerts based on the metrics and thresholds. Grafana supports traditional server environments, Kubernetes Clusters and various Cloud Services. Being an open-source tool Grafana is Cloud Agnostics tool. When it comes to modern micro-services applications, Grafana integrates very well and can visualize deep level information of each micro-service.
Grafana LOKI
Loki is a horizontally scalable, highly available, multi-tenant log aggregation solution. It’s designed to be both affordable and simple to use. Rather than indexing the contents of the logs, it uses a set of labels for each log stream.
Grafana Loki was inspired by Prometheus’ architecture, which uses labels to index data. This allows Loki to store indexes in less amount of space. Furthermore, Loki’s design is fully compatible with Prometheus, allowing developers to apply the same label criteria across the two platforms.
Grafana Tempo
Grafana Tempo is an open source, easy-to-use, and high-scale distributed tracing backend. Tempo is cost-efficient, requiring only object storage to operate, and is deeply integrated with Grafana, Prometheus, and Loki. Tempo can ingest common open source tracing protocols, including Jaeger, Zipkin, and OpenTelemetry
Grafana LOKI and Grafana Tempo Architecture
Below architecture explains how Grafana collects and monitor logs and tracing using Grafana LOKI and Grafana Tempo Extensions.
For Logging, real time logs are collected using Fluent bit plugin deployed on each application servers and sent to centralized log management tool called LOKI. Then using Grafana LOKI data source in Grafana, these logs are visualized in Grafana Dashboard.
For distributed tracing, Open Telemetry Collector receive, process and export telemetry data to to tracing backend Grafana Tempo. Then using Grafana Tempo data source in Grafana, these traces are visualized in Grafana Dashboard.
Tempo and Loki both integrate with S3 buckets to store the data. This relieves you from maintaining and indexing storage that, depending on your requirements, might not be needed.
Tempo and Loki are part of Grafana. Therefore, it integrates seamlessly with Grafana dashboards.
Setup Instructions
Pre-requisites
This blog assumes that following components are already provisioned or installed.
Kuberenetes Cluster
Grafana Deployed on Kubernetes Cluster — Setup instructions can be found here Deploy Grafana on Kubernetes | Grafana documentation
Helm Charts installed on local system.
Steps to setup Grafana LOKI and Grafana Tempo
Lets create the separate namespace for Grafana resources
2. Now we will use grafana helm chart to install Grafana Temp and Grafana LOKI. Initialise Grafana Helm chart repository and update the repository
3. Deploy Grafana Tempo using helm chart
** — version will change in future please refer to the link to check the latest version.
4. Deploy Grafana LOKI stack using helm charts.
Above you can include and exclude the persistent volume, promtail, fluent-bit, Prometheus by making True or false.
**if you are using LOKI, so promtail must have to enabled.
5. Now, let’s deploy Open-telemetry Collector. You use this component to distribute the traces across your infrastructure:
6. Deploy fluent-bit component. You will use this component to scrap the log traces from your cluster:
(Note: In the configuration you are specifying to take only containers which match the following pattern /var/log/containers/*.log
) in line number 16 of the below file
7. This component is already configured to connect to Loki and Tempo:
In the above Grafana Helm Chart we are defining the both Data Source of grafana with the help of datasource.yaml i.e. Tempo and Loki
we can also define the another data source in the Grafana in two ways:
a. In the Grafana UI we can go to the datasource and select the data source which we want to add and the url which we have to give is http://<service_name>.<namespace>.svc:<service_port>
b. Another option is to add new data source in the datasource.yaml, provide the configuration for the data source as shown in the above file, we can make any data source as default.
8. Once everything is deployed lets see the status of each components in Kuberentes cluster
In the above screenshot we can see that number of pod that are running in our cluster for Grafana and Tempo Setup. Here we are using Fluent-bit and promtail as a daemon set. We can see Pod for Grafana, Loki and Tempo also running
9. Check the status of services deployed in kubernetes cluster
In the above screenshot, we can see that number of services are created for resources that we had deployed for Setting the Grafana, Loki and Tempo.
10. Now we will login to Grafana UI with the service url that provided in above screenshot, with the username and password are provided when we are executing the helm chart of the Grafana
11. Click on Explore, You will there are two data source LOKI and Tempo and in the Grafana datasource.yaml we had defined the Tempo as default data source.
12. Select LOKI as the current data source. You will see the interface like below mentioned.
13. Click on the log browser button, you will the different labels as shown below.
Here LOKI provide the log filtration on the basis of labels, you can we have different labels like pod, namespace, node_name, release, app, containers, etc .
14. Now what we will do here is let take example of Grafana pod and we will see the logs of Grafana Pod.
Here we are selecting label as a pod and in the pod section we can see we have 24 Pod, we can filter out on basis of labels as well. example: filtering on the basis of namespace or container or node or app. When we select our Grafana Pod, In the 3rd step Loki will show the result of our query and after clicking on the button of Show logs you will able to see the logs in the Grafana.
15. Expand any log trace, you can see the detailed logs of Grafana Pod and you will also notice TraceID field is detected in the logs. Once you expand the logs you will also notice Tempo button in front of TraceID field
16. You can see the Services and the Operation of the Trace ID we had provided into the Tempo.
17. In the New version of Tempo (currently in beta ), there is a node graph where we can see the connected hub from where our Traces passed.
As we can see above how we can see the logs and distributed tracing using Grafana LOKI and Temp.
Last updated