Buzzword Bingo with NSClient++
In a world of Cloud, Containers and Micro Services are the classic model of monitoring hosts still relevant?
In this talk I will share my view on how to make sense of moniroting in a world of metrics and lets see how we can bridge the two models with a little help from NSClient++ up-coming support for Cloud, Containers, and some other buzwords.
Directing the Director
I will show how we designed our new central monitoring system based on Icinga2. Currently we are in the process of migrating our projects to this new system. While doing this, our teams learn a lot of new stuff and discover solutions to their specific problems. After describing, how we share our acquired knowledge to all teams, I will give a closer insight in one particular approach where we used Ansible to provision all our Icinga configuration via the Director interface.
Hot Potato is a message broker that sits in between monitoring systems and messaging providers to ensure consistent relaying of messages to on-call staff. It was designed and developed in New Zealand to survive the harshest worst-case scenarios we could come up with in a country prone to natural disasters.
The goal of the project is to give on-call people control and freedom while giving your notifications every chance of arriving, through any provider or connection you might have.
Rollout all your Prometheus exporters with Puppet!
Everybody loves Prometheus. Many exporters are available to gather specific data. You can download the binaries from GitHub, start them and they will expose data via plain HTTP, without any firewalling or authentication. That would just complicate the whole setup! This is how the usual installation guide looks. It works, but it doesn’t cover all the important parts a systems engineer loves.
– What about authentication?
– What about firewalling?
– How can we make the Prometheus server aware of the exporters?
– How do we integrate custom applications that also provide prometheus compatible metrics?
A secure and automated rollout of exporters isn’t easy. Also an authenticated connection from the prometheus server to the exporters requires some preparation. This talk will cover a proper concept and all details to rollout multiple exporters to many systems, completely automated with Puppet.
NeDi – Why it´s still here
NeDi is an open source software tool which discovers, maps and inventories your network devices and tracks connected end-nodes. It contains a lot of features in a user-friendly GUI for managing enterprise networks. When the project started around 18 years ago, there was not much else around. With many successful open source projects appearing in the meantime, it’s a different story today. So why is NeDi still alive and kicking?
Windows: One Framework to Monitor them all
Windows environments are nothing new within the monitoring community. There have been many attempts for integrating Microsofts Operating System including their software solutions with plenty of different attempts. Well, now here is another attempt – trying to bring the Best of both Worlds together and allowing an easier integration of Windows systems including the monitoring components. The new Icinga Monitoring will hopefully not only attract system engineers, but also encourage Developers to bring in own custom Plugins and Checks more easily.
The Telegraf Toolbelt: It Can Do That, Really?
Telegraf is an agent for collecting, processing, aggregating, and writing metrics.
With over 200 plugins, Telegraf can fetch metrics from a variety of sources, allowing you to build aggregations and write those metrics to InfluxDB, Prometheus, Kafka, and more.
In this talk, we will take a look at some of the lesser known, but awesome, plugins that are often overlooked; as well as how to use Telegraf for monitoring of Cloud Native systems.
Monitoring Alerts and Metrics on Large Power Systems Cluster
In this talk we’ll introduce an open source project being used to monitor large Power Systems clusters, such as in the IBM collaboration with Oak Ridge and Lawrence Livermore laboratories for the Summit project, a large deployment of custom AC922 Power Systems nodes augmented by GPUs that work in tandem to implement the (currently) largest Supercomputer in the world.
Data is collected out-of-band directly from the firmware layer and then redistributed to various components using an open source component called Crassd. In addition, in-band operating-system and service level metrics, logs and alerts can also be collected and used to enrich the visualization dashboards. Open source components such as the Elastic Stack (Elasticsearch, Logstash, Kibana and select Beats) and Netdata are used for monitoring scenarios appropriate to each tool’s strengths, with other components such as Prometheus and Grafana in the process of being implemented. We’ll briefly discuss our experience to put these components together, and the decisions we had to make in order to automate their deployment and configuration for our goals. Finally, we lay out collaboration possibilities and future directions to enhance our project as a convenient starting point for others in the open source community to easily monitor their own Power Systems environments.
Ein Loblied auf die Icinga 2 DSL
In diesem Talk möchte ich die Begeisterung an der Icinga 2 DSL zur Konfiguration wecken und zeigen wie man eine Überwachung von Windows und Unix-Systemen aufbaut. Zunächst sollen die Systeme per NRPE bzw. SSH überwacht werden, um dann sukzessive durch den Icinga 2 Agenten ersetzt zu werden.
Lorem Icinga puppetdb director amet
In Anlehnung an “Selbstorganisation” (iX 08/2019) soll es um automatisiertes Monitoring unter Einsatz des Icinga Director und am Beispiel von PuppetDB gehen. In einer Mischung aus Talk und Live Demo sollen auch jene, die noch nie etwas damit zu tun hatten, eine Idee von Funktionsweise und Vorteilen dieses Ansatzes erhalten.
Prozess-Automatisierung im Monitoring
Müssen Systeme neu erstellt, aktualisiert oder entfernt werden, sind oftmals manuelle Schritte notwendig. Solche händischen Arbeitsschritte werden trotz hervorragender Dokumentation gerne übersehen, was sich dann insbesondere auf so nebensächliche Prozesse wie Backups – oder Monitoring – auswirkt.
Wie Konfigurations-Management genutzt werden kann, um Monitoring zu automatisieren, soll in diesem Talk einerseits anhand von Puppet und andererseits von Zabbix und Prometheus demonstriert werden. Der Fokus liegt dabei nicht auf Kubernetes oder der Cloud, da uns dort einige andere Möglichkeiten zur Verfügung stehen.
Centreon-plugins is a free and open source project to monitor systems. In his talk Quentin Garnier will present a short description of the project. Starting with basic examples he will then come to advanced usages like openmetrics output, password manager and many more.
Grafana Loki: Like Prometheus, but for Logs
Loki is a horizontally-scalable, highly-available log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate, as it does not index the contents of the logs, but rather labels for each log stream. Loki initially targets Kubernetes logging, using Prometheus service discovery to gather labels and metadata about log streams. By using the same index and labels as Prometheus, Loki enables you to easily switch between metrics and logs, enhancing observability and streamlining the incident response process – a workflow we have built into the latest version of Grafana. In this talk we will discuss the motivation behind Loki, its design and architecture, and what the future holds. Its early days, but so far the response to the project has been overwhelming, with more the 6.5k GitHub stars and over 12hrs at the top spot on Hacker News. Loki is an opensource project, Apache licensed.
Monitoring Cockpit for Kubernetes Cluster
Monitoring Kubernetes Clusters with Prometheus is state of the art. The difficulty is to find the significant metrics from the vast amount of available metrics. This talk shows a Monitoring Cockpit defined to get a quick overview of the cluster health and usage. It uses the Standard Metrics available for Kubernetes/OpenShift Clusters and their standard services. The monitoring solution is based on Prometheus, using InfluxDB for central long term storage and Grafana.
HRS-Connect – Open Source IoT Monitoring
The expansion of hydrogen refueling stations in Germany and Europe is progressing rapidly. But how to maintain the overview of all relevant parameters and fault messages? How can the refuelling processes be analysed in realtime in order to improve the systems together with the manufacturers? And how does the data from the system actually get into the monitoring system? This presentation will show how a powerful IoT monitoring solution was developed based on the hardware of the RevPi platform and open source software such as Ansible, Icinga2, InfluxDB and Grafana, which helps to ensure the availability of the hydrogen infrastructure and optimize it further.
On-board Diagnostics Monitoring and Alerting with Zabbix
Real-time data as speed, engine temperature, DTC codes as check engine, brake problem and more. Read live data from sensors as turbo boost, timing, duty cycles. Monitor current vehicle speed and analyze the gathered statistics. Be notified!
Zabbix LLD from a C Module
Low-level discovery provides a way to automatically create items, triggers, and graphs for different entities. For instance, Zabbix can automatically start monitoring file systems or network interfaces on your machine, without the need to create items for each file system or network interface manually. Using a real-life practical example which we use to monitor vehicles issued with GPS trackers which communicate via MQTT, we will discuss how we implement Zabbix Low-Level Discovery directly from a C module and how the same C module is used to provide up-to-date information from the vehicles to Zabbix items. This basic principle can easily be adapted to provide similar functionility to Internet of Things (IoT) projects. While it helps if you can read a bit of C language code, we’ll explain what’s going on behind the scenes even if you don’t.
Monitoring Windows Events without Monitoring Logfiles
If you search the internet for how to monitoring Windows Events with Nagios/Naemon/Icinga(2) etc. you find pages over pages how to monitor lgofiles for Windows events. Monitoring logfiles can be a real big nightmare.
– How often will you scan a log?
– Have you processed the event with an earlier scan?
– What to do if a event is not logged?
Monitoring eventlogs needs mostly complex filter rules. And it is mostly not realtime. Beside nsclient++ real time event log monitoring there is not so well known but very effective method. But there is a method without installing any additional software on Windows. Without analyzing logfiles. SNMP traps. Presentation will show how to configure Microsoft SNMP to send traps, how to tell MS Windows to send events as traps same time when the event is written to the logfile and how to process the event with SNMPTT.
Zero Trusted Networks – why Perimeter Security is dead
In the days of microservices, cloud, Saas and Paas, perimeter security, albeit still widely used, is not sufficient anymore. Inside attacks have become more prominent than outside attacks, posing a huge risk to your network and data. The Zero Trusted Networks approach treats all hosts and components as if they were internet-facing and considers the entire network to be compromised and hostile.
Die DSGVO als Chance nutzen – Monitoring der Informationssicherheit
Die DSGVO ist jetzt seit über einem Jahr gültig und in der Theorie sollten alle Organisationen die Regeln und Prozesse fertiggestellt haben. Die Realität ist anders. Aktuelle Ereignisse zeigen klar das Informationssicherheit (oder deren Fehlen) direkte Auswirkungen auf den Unternehmenswert und -wachstum haben. Sowohl in empfindlichen Strafen als auch bei Unternehmensaufkäufen spielt eine robuste und dokumentierte Kultur in der Informationssicherheit eine wachsende Bedeutung. In dieser Session betrachten wir diese Bedeutung aus dem Blickfeld von CxOs und Finanzmenschen und werfen einen Blick auf die wertvollen aber unterschätzten Empfehlungen des BSI. Auf deren Basis lassen sich Entscheidungsträger relevante Dashboards und Monitoringansätze bilden und erhöhen damit den Wert der Monitoringlösung für die Unternehmensleitung.
Monitoring a hybrid world: Checkmk´s latest major Release
Die IT-Welt wird “Hybrid” – von lokalen Rechenzentren zur Cloud und von statischen IT-Infrastrukturen zu dynamischen Containern. Checkmk ist seit langem ein All-in-One-Tool zur Überwachung von Servern, Netzwerken, Anwendungen, Datenbanken und mehr. Mit dem neuesten Release haben wir uns auf die Weiterentwicklung der Überwachung von Docker, Kubernetes, AWS, Azure und den Umgang mit dynamischen Infrastrukturen konzentriert. In diesem Vortrag führt Euch Lars durch einige der neuen Funktionen und wie Checkmk dabei helfen kann eine hybride IT-Welt zu überwachen.
Automation the Configuration of Monitoring on large Infrastructures
Setting up monitoring on dynamic, large environments can be challenging. This session will cover how to provision a monitoring infrastructure with Prometheus and Grafana the easy way using Salt states (https://github.com/saltstack/salt) and Uyuni (https://github.com/uyuni-project/uyuni), and how these tools can help with:
– Automating the installation of exporters on monitored systems
– Prometheus configuration and service discovery mechanisms
– Grafana provisioning and sample dashboards
Monitoring your Logs with Fluent
This presentation shows how to setup Icinga2 with Fluent and Grafana, for logging, monitoring, dashboarding and notifications. In the first part the presentation show how to setup FluentD the server part of Fluent for log aggregation, Fluentbit is the client that ships logs to the log server for both systems and applications. In the second part of the presentation the setup of Grafana for dashboarding is explained. In the third part the setup of Icinga2 for monitoring and notifications is explained. And finally the integration between these part is explained so you can get an integrated solution. At the end of the presentation a demo will show how this works with some examples.
Monitoring Event Piplines: Why you need one, and why you should stop rolling your own
The rules have changed. The shift from static to dynamic infrastructure requires a change in approach to monitoring, from host-based to functional role-based. Connectivity moves from remote polling to publish-subscribe and push APIs. The control plane moves from point-and-click interfaces to infrastructure-as-code workflows and self-service, developer-friendly APIs.
Now — more than ever before — organizations are faced with a deluge of data in various formats. As operators and as natural system integrators, our response to these challenges is likely to roll our own solution, construct a scalable data collection and processing pipeline and combine a number of best-of-breed tools. This is an idea and journey that I am all too familiar with. In 2010, as an operator, working for an early Cloud adopter, I was faced with these challenges, and fought for better visibility. I hacked for the greater good. In this talk, I will give an overview of what I consider to be attributes of an effective monitoring pipeline. I will recount my experience in creating Sensu, the open source monitoring pipeline, and the pitfalls the project has encountered. I will then live demo Sensu monitoring pipelines in action and make my case for not rolling your own solution from scratch.
Vom Bordstein zur Skyline
Ausgehend von Xymon, librenms und zabbix hinzu Prometheus, ELK, icinga2 vereinigt in Grafana. Eine Warstory wie man mehr Sichtbarkeit mit Metriken und neueren Monitoring Systemen schafft. Von 5 Minuten Messintervallen runter bis auf 1 Sekunde.
Improved Observability Using Automated, OpenCensus-based Application Monitoring Solutions
Today’s complexity of enterprise software systems increases the importance of observability and monitoring solutions. DevOps teams or the developers and operators in the “classical world”, respectively, need clear transparency about the performance, availability and reliability behavior of their software-systems to be able to manage an appropriate level of service quality. Besides the big, commercial Application Performance Management (APM) solutions the open-source market provides a huge variety of different tools targeting different aspects of APM. Combining these tools opens great potential for building flexible, tailored APM solutions. Open standards, such as OpenTracing, OpenCensus or OpenMetrics are supporting the vendor-independent collection of data. However, most of the open tools that are based on these standards still require manual adoption of code to instrument applications for data collection. Often, such code changes are not desired or simply just not possible. In this talk, we will present an open, OpenCensus-based approach that provides a simple, yet flexible, configuration-based way of collecting monitoring and business data. Data can be collected without the need for manual code changes. As OpenCensus is used under the hood, this approach allows creating tailored APM solutions, as monitoring backends can be exchanged without any additional overhead. In this talk, we give insights into the OpenCensus standard and show how OpenCensus instrumentation can be automated. Through practical examples using different open tools, we demonstrate the flexibility of the presented approach.
FRANCESCO CINA / PATRICK ZAMBELLI
Tornado – Extend Icinga 2 for Active and passive Monitoring of complex heterogeneous IT Environments
The main objective of this talk is to show how you can extend an Icinga2 active monitoring with a passive monitoring engine.
With Icinga2 you focus on active monitoring. With Tornado you can do exactly the opposite and focus on passive monitoring. You receive events from different channels like SNMP Trap, Syslog, Email and match them against a rule engine and decide which action to associate. A very common use case is to set a Icinga2 Service status (critical, warning, ok) based on a matched rule. In addition you could also subscribe the Icinga API Stream and define matching rules which you would like to correlate and associate an executor for example create a new entry in Elasticsearch. Another common use case could be to register Tornado as a webhook for example in Elasticsearch Watcher collect the alarms and set the status on a Icinga Service Check. During the talk I will explain why Tornado was built from Würth Phoenix in rust and what are the common use case we would like to address with it.s.
Naemon and Friends
Naemon is the engine of several opensource monitoring solutions. This talk will take a look at what’s new in and around Naemon. I will try to answer questions like, what is a NEB module and what can it do for me? What are common addons and tools used together with Naemon. While investigating some interesing scenarios i will share some best practices, for example the LMD livestatus proxy or the new Mod-Gearman worker.
How to improve database Observability
Delivering a database service is not a simple job but to ensure that everything is working correctly your platform needs to be observable. In this talk, I’ll talk about how we make the MySQL/MariaDB databases observable. We’ll talk about the RED, USE methods, and the golden signals. You’ll discover how we dealt with the following questions “We think the database is slow”. This talk will allow you to make your databases discoverable with open source solutions.
Monitoring Nomad with Prometheus and Icinga
Things like Infrastructure as Code, Service Discovery and Config Management can and have helped us to quickly build and rebuild infrastructure but we haven’t nearly spend enough time to train our self to review, monitor and respond to outages. Does our platform degrade in a graceful way or what does a high cpu load really mean? What can we learn from level 1 outages to be able to run our platforms more reliably. We all love infrastructure as code, we automate everything ™. However making sure all of our infrastructure assets are monitored effectively can be slow and resource intensive multi stage process. During this talk we will investigate how we can setup and monitor a cloud native container platform that scales using hashicorp’s consul and nomad service discovery and container scheduling tools and Traefik a edge router. This talk will focus on making sure we can have alerts and metrics in this quickly changing infrastructure landscape. We’re going to show how to integrate icinga2 with consul and nomad. To finish off we´ll show how to visualize the prometheus data in a way that resembles netflix’s vizceral using freely available grafana dashboards and plugins.
Idiot! – or: Why BOFH is toxic
Behaviour is mirrored so as the choice of words reflects the mood and the feelings that are transported. Keeping that in mind will make you happier and with that a better colleague. How speech and behavior can have an influence on your health, your happiness, and your work – but also how others speak about you and your work. Even if you do volunteer work in open source all that count and might have influence if a user works with the product or move somewhere else. Why does that count? Because a change starts always with yourself and the internet need more love.
Use Cloud services & features in your redundant Icinga 2 Environment
This talk will start with a quick walk through the setup of all required components for a cloud based icinga2, icingaweb2 & icingaweb2-director environment. Focus will be on the configuration and monitoring of keepalived, HAProxy and Galera. Keepalived for example is used to interact with DigitalOcean and manage floating IPs. Examples will show how to use DigitalOcean loadbalancer instead of HAProxy. The talk will end with a summary of experienced limitations and pitfalls.
Iginte | How Observability is not killing Monitoring
When the term Observability came up we were all told that we don’t have to care about our systems anymore. We should move all of our infrastructure to the cloud, into containers and finally serverless will solve not only all our monitoring issues but all other things as well. Reality shows that this is not the case for most of us. While the importance of observability is given, we should not forget about everything we learned to make our monitoring suck less.
Ignite | Overengineering your personal website
Lets be honest, whether consciously or not we all do it. In this talk we ll discuss how stupid crazy i a managed to go, while serving only a single static html page. From the humble beginnings as a markdown file to automation and several layers of monitoring.