Icinga for Windows in the Monitoring of Madness (EN)
Developing Icinga for Windows requires many different approaches, to ensure features are properly functioning with new versions of Windows as well as old ones, who support PowerShell by default. We want to give an insight on one hand, what kind of challenges developing Icinga for Windows faces, how sometimes user expectations, developer expectations and reality collide and what the new feature set of Icinga for Windows v1.11.0 ships with.
AutoHeilung mit Ansible (DE)
Ein zentrales Monitoring ist ein Grundpfeiler der modernen IT Infrastruktur. Durch die Überwachung von Fehlermeldungen und besonderen Situationen reagieren wir schneller und die Kunden können schneller bei Ausfällen wieder arbeiten. Eine zentrale Automatisierung hilft den Admins beim Ausrollen von Konfigurationen und dem Neuaufbau von Infrastruktur. Ich möchte den Teilnehmern auf unsere Reise zu einer Automatischen Reparatur mit Ansible und AWX nehmen und dadurch aufzeigen, dass die Probleme dadurch schneller gelöst werden.
VictoriaMetrics: scaling to 100 million metrics per second (EN)
The growth of observability trends and Kubernetes adoption generates more demanding requirements for monitoring systems. Volumes of time series data increase exponentially, and old solutions just can’t keep up with the pace. The talk will cover how and why we created a new open source time series database from scratch. Which architectural decisions, which trade-offs we had to take in order to match the new expectations and handle 100 million metrics per second with VictoriaMetrics. The talk will be interesting for software engineers and DevOps familiar with observability and modern monitoring systems, or for those who’re interested in building scalable high performant databases for time series.
Open Source: Open Choice – A DevOps Guide for OSS Adoption(EN)
Choosing the right open source project to use can be quite challenging – not knowing if it’s going to be the right fit, how it will behave, and if you end up wasting time trying to make it all work. We’ve all been there. But what if I told you there’s a practical way to have a clear understanding of how to incorporate an OSS project in your environment? In this talk, I’m going to speak about the DevOps perspective on open source and the challenges Infra-focused engineers have with choosing the right project for their environment. As a DevOps Engineer, I’ve seen a lot of things, stumbled upon a lot of non-based decisions, and so will present practical advice on how to choose an OSS project for your dev/prod environment and will talk about the business mindset you should have to evaluate the key indicators based on your needs and specific pain points.
What’s new in the Prometheus ecosystem? (EN)
While Prometheus is now a mature monitoring solution, there is a lot of things going in the Prometheus ecosystem. Let’s have a look at some of the most important changes and novelties in the last few years, in the Prometheus community.
VMware monitoring with ease (EN)
Every release of the Icinga Module for vSphere® comes with a whole a bunch of new features. Over the time, vSphereDB has grown from a nice visualization Add-On to a grown-up full-blown monitoring component. This talk tries to shine light on some of it’s lesser-known features, showcases it’s use in various real-world scenarios and puts focus on leveraging monitoring-related components in an automated way.
IGNITE: Observability with Grafana & Prometheus for Kafka on Kubernetes (CFK) | (EN)
Self-managing a highly scalable distributed system with Apache Kafka® at its core is not an easy feat. That’s why operators prefer tooling such as Confluent Control Center for administering and monitoring their deployments. However, sometimes, you might also like to import monitoring data into a third-party metrics aggregation platform for service correlations, consolidated dashboards, root cause analysis, or more fine-grained alerts. If you’ve ever asked a question along these lines: Can I export JMX data from Confluent clusters to my monitoring system with minimal configuration? What if I could correlate this service’s data spike with metrics from Confluent clusters in a single UI pane? Can I configure some Grafana dashboards for Confluent clusters?
This talk will enable you on achieving the below:
Monitoring Your Event Streams: Integrating Confluent with Prometheus and Grafana (this article)
Monitoring Your Event Streams: Tutorial for Observability Into Apache Kafka Clients
IGNITE: That’s nuts! A proof of concept of Icinga2 on Kubernetes using Acorn (EN)
Icinga2 with its many moving parts, credentials for database, API, and other features, and layered topology can be considered a pretty complex system. When deploying to Kubernetes, it might in fact prove overwhelming when transitioning from ‘traditional’ setups. In this Ignite Talk, we will have a quick look at typical Icinga2 setups you might encounter in the wilds and how we can deploy them to Kubernetes clusters using acorn (https://acorn.io), a new deployment application framework for Kubernetes aimed at simplicity and developer experience. We will explore acorn’s basic idea(s), how it is configured and used in production, and where its shortcomings as of today might lie while going through the process of deploying a full-blown Icinga2 setup to a Kubernetes cluster.
IGNITE: How to benchmark … poorly (EN)
After getting a bit of a bad reputation (“benchmarketing”) it looks as if vendor benchmarks are very much in fashion again. Let’s take a quick look at common mistakes and how to do them … poorly.
Unifying Observability: Weaving Prometheus, Jaeger, and Open Source Together to Win (EN)
Observability is a hugely popular topic, however, for open-source users, significant challenges remain. For starters, related licensing is frequently problematic—and even when it works, there is no pure Apache 2.0 licensed technology to get data collection and visibility into your logs, metrics, and traces. Thankfully, this is gradually changing as the community builds new capabilities into OpenSearch Dashboards to unify the visualization of logs from OpenSearch, metrics from PromQL compatible systems, and traces from Jaeger. In this session, we’ll examine how this important project is evolving as a fork of the previously popular ELK stack. We’ll also take a closer look at the current state of OpenSearch and Jaeger and discuss how these efforts are going to provide a foundation for unified observability to the open-source communities. By using OpenTelemetry for data collection, this foundation provides a pure Apache 2.0 licensed open-source platform for unified observability. OpenSearch also includes features like Alerting and Machine Learning, which are not part of Jaeger today. The work on this foundational integration is well underway and will provide open-source users with a solid alternative to vendor controlled and provided solutions. This also opens up the marketplace for solutions to be created to host and manage these at scale, something we’ve seen with countless other CNCF projects. This talk will be presented by a contributor and maintainer of OpenSearch, Jaeger, and OpenTelemetry, which are all vibrant user communities. Join the conversation!
Security as Code: A DevSecOps Approach (EN)
Security as Code (SaC) is the methodology of codifying security tests, scans, and policies. Security is implemented directly into the CI/CD pipeline to automatically and continuously detect security vulnerabilities. Adopting SaC tightly couples application development with security and vulnerability management, while simultaneously enabling developers to focus on core features and functionality. More importantly, it improves the collaboration between Development and Security teams and helps nurture a culture of security across the organization. In this session, we will
review lessons learned from DevOps to implement a thriving DevSecOps culture, in particular how we can make developers contribute security checks with the SaC approach. We will introduce CodeQL, a language that allows us to implement security checks with code. We will demo how we can code queries for vulnerabilities and misconfigurations so they can be identified as soon as they hit your CI/CD pipeline.
OpenTelemetry 101 (EN)
Everyone wants observability into their system, but find themselves with too many vendors and tools, each with its own API, SDK, agent, and collectors. In this talk I will present OpenTelemetry, an ambitious open source project with the promise of a unified framework for collecting observability data. With OpenTelemetry you could instrument your application in a vendor-agnostic way, and then analyse the telemetry data in your backend tool of choice, whether Prometheus, Jaeger, Zipkin, or others. I will cover the current state of the various projects of OpenTelemetry (across programming languages, exporters, receivers, protocols), some of which not even GA yet, and provide useful guidance on how to get started with it.
Providing a Rich Interface to the Prometheus Operator (EN)
Configuring Prometheus isn’t exactly on anyone’s top list of “fun nights in”, but it’s something we all need to do. We spend so much of our time slinging YAML that it can be easy to forget that there are alternatives to working with Kubernetes CRDs, so let’s spend some time together taking a look at an experience that puts the developer first. In this session, I will guide you through using Pulumi to author and deploy your own Prometheus rules and alerts with a great developer experience that provides intelligent code auto-completion right within your favourite IDE: whether you write in Go, Python, TypeScript, or event dotNet.
Scaling SLOs with Kubernetes and Cloud Native Observability (EN)
Defining Service Level Objectives and Service Level Indicators is a really important aspect of implementing SRE. Through service metrics (SLOs, SLIs, Error Budgets), SRE can help us measure our system’s performance and improve customer experience. They not only enable your teams to monitor and plan around reliability, but can also be early predictors of customer satisfaction, NPS, churn rates, and more. With the rise of cloud native technologies, it has become more and more relevant to automate our observability, extending it to an SLO-as-code model. In this session we’ll see how SLOs have evolved and can be used in a Cloud Native world. We’ll then explore how technologies like Kubernetes and Prometheus can help us scale SLOs, while promoting best practices and standards using Observability as code. Finally, we’ll see how to put all these together with Jenkins and Rancher, to operationalize error budgets.
Monitoring & Betriebsrat (DE)
In vielen Betrieben Deutschlands gibt es Betriebsräte, die bei der Einführung neuer technischer Tools ein Mitbestimmungsrecht haben. Bisherige Gespräche mit IT-Kolleg*innen haben gezeigt, dass dieses Recht bei der Einführung nicht immer beachtet wird, was im Nachgang zu vermeidbaren Komplikationen und Streits führen kann. Diese Präsentation soll einen kurzen Einstieg in die Grundsätze der betriebsverfassungsrechtlichen Mitbestimmung sowie Eckpunkte mitgeben, an welchen Stellen beim Monitoring Betriebsräte einzubinden sein können.
Einfaches, effizientes und schnelles Monitoring mit Open Source (DE)
Keine Lust auf eingestaubte Konfigurationsdateien, Nagios und das irgendwie anpacken zu müssen? Bei openITCOCKPIT erfolgt die Konfiguration ausschließlich über die Weboberfläche.
Mit dem plattformunabhängigen Agenten ist das Basis-Monitoring in weniger als 2 Minuten eingerichtet. Die Kommunikation ist selbstverständlich verschlüsselt und danke Pull oder Push Modus lässt sich jedes System abfragen.
Freiwillige vor! Gerne überwache ich während der Demo euer Notebook.
Automated Incident Response for Cloud Native Risks (EN)
Incident response teams are already drowning in alerts – and potentially are missing critical vulnerabilities. What usefulness there is to a security scanner which tells you there are thousands of vulnerabilities, but you need to take the time to go fix them? Extending visibility and responsibility to cloud native environments compounds this challenge faced by teams of weeding through huge volumes of alerts to determine which risks are the most urgent, and how best to respond to incidents. This session will cover how security teams can use open source projects to better identify high risk cloud native events, orchestrate responses with other third-party integrations based on these high-fidelity insights, and execute playbooks for more automated and effective incident analysis and handling processes. We will cover a variety of use cases ranging from simple cases such as acting upon CVE detections when performing vulnerability scans to more complex scenarios of runtime d etection. The session will focus on practical use case scenarios that are commonly observed in day-to-day situations.
Currently the two dominant check scripts for VMware/Vsphere are check_vmware_esx and check_vmware_api. But they have a problem: they are based on the Perl SDK provided by VMware, which they have deprecated. You may already see problems when checking against newer VMware installations. So, we need a plugin in a different language. That’s why I started a new plugin written in Python using the Python SDK which is not deprecated. While doing this I also studied the check_vmware_esx/api and found out that they actually doing a very bad job with metrics based checks. Also, the VMware API is… well… let’s say over-engineered. So, this talk is about what I’ve learned during the implementation, what the current plugins do wrong, the overengineered VMware API and what the current state of the new plugin is.
The Power of Metrics, Logs & Traces with Open Source (EN)
The talk will show how organisations can drastically reduce their MTTR (Mean Time To Repair) by using, integrating & correlating the open source tools Mimir, Loki & Tempo. We will then take the next step into open source reliability testing to even avoid problems in the first place. And yes, we will use Grafana 🙂
In 60 Minuten zum IoT Projekt auf der Basis von ThingsBoard (DE)
Im Talk wird anhand eines praktischen Beispiels eine Einführung in die Welt des “Internet of Things” gegeben. Angefangen von der Vorstellung einiger Sensorplattformen und der Programmierung eines Mikrocontrollers, über die Konnektivität mit NB-IoT bzw. LoraWan und AWS Core IoT bis zur Verarbeitung, Speicherung und Visualisierung der Daten auf der Basis der Open Source Software ThingsBoard werden alle Aspekte der IoT Welt beleuchtet. Somit erlebt der Zuhörer live, wie ein komplettes IoT Projekt entsteht.
Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right choice? (EN)
Back in the old days with the ELK Stack, ingesting logs (and other data) was straight forward: Logstash or maybe Fluend. Today you have a lot more options: Beats have been around for a long time, but Elastic Agent is the hot new thing. And then there is also Open Telemetry that’s growing in use-cases. What’s the right choice? This talk gives a quick overview of the current options and their tradeoffs including some common scenarios and how one or more of the tools can solve your problems.
Metrics Stream Processing Using Riemann (EN)
This talk will cover:
Introduction to Riemann
* In-memory stream processing system written in clojure
* How it can easily process millions of events per second
* Configuration is clojure code
What problems Riemann solves in the today’s world of Prometheus & Datadog
* High cardinality metrics problem: Since all processing is done in*memory, riemann can handle high cardinality issue much better than prometheus.
* Instant detection: Since riemann uses websockets, an issue is instantly reflected into dashboard. In prometheus, it will only get reflected after next scrape.
Riemann Stream Processing Engine
* Types of functions on riemann streams.
* Combine multiple streams into one stream.
* Split one stream into multiple streams.
* Filter, roll up, throttle, coalesce events.
* How riemann schema is extendable and use for streaming other events like logs.
* Using docker-compose setup of riemann to present the power of riemann and riemann-dash.
IGNITE: The O11y toolkit (EN)
The O11y toolkit is a set of utilities to help you maintain, debug, and augment your open source observability stack. Our tools will improve your experience with metrics, logs, and traces. We have a couple of tools published already. We want a seamless user experience across the tools, providing as much consistency in behaviour and packaging.
Thruk 3 – Monitoring at glance got a fresh look (EN)
The Thruk web ui, capable of handling millions of services served well for many years. It still does, and even got more useful features. This talk will introduce the brand new Thruk 3 release coming with a completely reworked interface. There will be a little bit for everyone.
How we improved our monitoring so that everyone likes to be on-call (EN)
Ever wonder why your Engineers don’t necessarily like being on call? There can be many different reasons for this, and one cause could be a poorly configured monitoring system. In this talk I would like to share with you the different stages we went through as a team to get from an inadequate monitoring to a solution that provides real value not only for the customer but also for us as a team.
Monitoring in a Serverless World (EN)
Serverless applications are becoming a more and more attractive architecture of deploying software, providing many benefits over more traditional architectures. Unfortunately, due to the locked-in nature of Serverless platforms, monitoring options outside of vendor provided solutions are scarce. But does that mean we are doomed to be subject to Vendor lock-in? In this talk, Colin will cover the landscape of existing open source solutions to monitoring serverless applications, as well as present Cloudflares solution to how they monitor their increasingly large portfolio of serverless applications.
AI Driven Observability based on Open Source (EN)
Observability & monitoring of resources are growing every day and it is inevitable to analyse all the data points to arrive at a solution. At Mercedes-Benz we have developed an open source Data Metric Analyzer and drive it with Data Science to identify Anomalies. As part of this talk, I / we would like to discuss about how we established the entire Data Processing Eco-System based on Open Source. Different technologies that would be talked about includes:
– Python: Data Science Components
– Airflow: Data Orchestration for metrics
– Telegraf: Data Collection
– TimescaleDB: Data Store for Timeseries Data
– Grafana + Streamlit: Visualization
OLENA KHARCHENKO & FRANCO SOLLNER
Vom Spam zum Mehrwert: Ganzheitliches APM und intelligentes Incident Management (DE)
Wie nutze ich das Potenzial meines Monitoring-Systems in vollem Maße aus? Wie kreiere ich ein optimales Setup? Wie gehe ich mit Alarmfluten um? Auf diese Fragen gehen wir in unserer Session ein und zeigen dir, welche Erfahrungen wir im Bereich Application Performance und Incident Management gesammelt haben. Wir besprechen, welche Daten die Grundlage einer modernen Observability-Plattform sind und wie man mit Hilfe von Incident Management einen ganzheitlichen und nachhaltigen Prozess gestaltet.
Let’s build a private cloud – how hard can it be? (EN)
When we built our private cloud with OpenStack, we never thought it would be this complex or time-consuming. In this talk I want to share our approach, the challenges we faced, and why we learned to appreciate good monitoring and log aggregation.
Git Good – How knowing git can make your life easier (EN)
In our tech life, it’s hard to avoid git. Pretty much everyone has to interact with it in a brief way, to make a commit at some point, or to review someones changes. Gits complexity can be get really intimidating and scary when you dig in deeper. But don’t let that scare you off, because once you understand what it is trying to do, it is a lot less daunting. And you will want to integrate it into every single one of your projects!
Monitoring multiple Kubernetes Clusters with Thanos (EN)
By now, Prometheus has become the defacto standard for monitoring containerised applications. However, when it comes to monitoring multiple Kubernetes clusters through a single plane of glass, additional tools are required. In this talk, I will show how to setup a production monitoring landscape based on Prometheus and Thanos, spanning several Kubernetes clusters. Focussing on examples and best practices, I will also elaborate on how to secure communication between the individual components.