Prometheus might have its origin in Greek mythology, it’s certainly not a strange creature to any sysadmin that needs a quick overview of the metrics of their server.
I'd like to take a moment to appreciate its greatness and give you an overview of how Prometheus is supposed to work and what it’s supposed to do.
As with any tool, no matter how mythical it is, the power of the tool lies with the person who yields it.
Prometheus is an excellent tool because it helps us scrape all the metrics we need from our servers. We can simply set up a template and tell Prometheus where our servers are, after that Prometheus will pull the metrics from those servers and get them in one place.
As we all know, the Greek gods didn't play alone and neither does Prometheus. On its own, it's already a very strong tool but combined with other tools such as Grafana, we can build a strong team when it comes to getting an overview of our servers.
Dive into the fire with me my friends, and explore Prometheus as a mythical tool. Let's see how it can help you and your organisation grow at a sustainable rate and give you an overview of the services you have running. We want to do this independently of the tech that needs to be monitored, and we certainly want all our metrics in one place.
What is Prometheus?
Prometheus is a 100% open-source and community-driven tool that helps organizations get a better insight into the metrics of their servers and services. All components are available under the Apache 2 License on GitHub. These days, software architecture is often distributed and no longer situated under one server. Companies often employ multiple services and servers per environment, that’s why they need a way to monitor the metrics these servers and services produce.
Prometheus is so much more than just a collecting agent though, it’s a system that can do much more. For starters, one of the main advantages of the Prometheus tool is that it can be used to set rules on these metrics and even trigger alerts if need be. It has the power to inform the user when their servers are displaying irregular behavior, and it can be easily deployed as it’s not reliant on a distributed storage system.
We can set up a lot of rules, but nothing works as well as the human eye when it comes to spotting irregularities that are harder to describe in rules. That’s why Prometheus also offers a visualization module that can help us chart the metrics we’re collecting. This gives us a great live insight into the health of our servers and services.
How does it work?
You might be wondering: ‘How is this solution implemented in a software architecture design?’. To explain this, we first have to make sure that we know exactly how Prometheus gathers its’ metrics. This is all done by making active HTTP requests to endpoints that return data that can be interpreted by Prometheus. This is often done by using ‘exporters’ and ‘integrations’. Every service is going to have its’ own ‘exporter’. For example, every popular HTTP service has an exporter, but we also find exporters for DB systems, storage systems, APIs, Loggers, and much more! The real beauty comes from the fact that users are encouraged to write their own exporters if none exist yet for their code.
These exporters will expose endpoints that can be called by Prometheus to gather the required data. With these data, Prometheus can then create visualizations that we can easily configure to our liking. Some often-gathered metrics for example, are the number of errors over time, CPU usage over time, or active threads on a database. We can often see however, that the graphing capabilities are not always sufficient, in that case, we can fall back on tools such as Grafana, which has supported Prometheus for a while.
Another powerful tool at our disposal is the alerts. This section consists of two parts, first the server sends out the alerts, and then the server catches and handles those alerts (like sending an email and SMS in case of >20% error rate). Together these two components allow us to create powerful rulesets that can help monitor infrastructure all day.
What can I be used for?
With those powerful capabilities, ‘Prometheus’ is a fitting name. Don’t forget, Prometheus (a titan elder god) stole fire from the Olympian gods and gave it to humans, suffering a cruel punishment. Prometheus today, is a tool that also wants to make life easier for humans. The titan-god of fire reigns supreme over the server kingdom. We can use this tool to not only scrape metrics from servers and congregate them in one place. We also get the tools to visualize our infrastructure’s metrics and set proper alerting rules. Together, it’s not hard to imagine a world where all of our servers and services are monitored, made visible, and alerted in a timely fashion.
We can’t forget the auxiliary uses though. Like the blacksmith wielding fire, we should also learn how to yield multiple tools, as they can combine into a sum greater than its parts. And like the blacksmith (or the Greek god of fire Hephaestus) combines tools, so must we if we want to effectively make use of the full potential of the centralized time series.
Powerful as this tool is, it still relies on the services properly reporting the correct metrics, monitoring and alerting the correct incidents, and setting up the right visualizations. With any of these components missing, we could still get a false sense of security. That being said, when we apply the correct tactics in monitoring and alerting with the powerful capabilities of centralized time series and advanced integration capabilities, it will be a lot easier to diagnose issues and even prevent them before they occur.
P.S. If you want to know more about this software check out Prometheus: The Documentary, and learn how it was built to solve a very real problem.