Video & Slides

SLIDES

CHRISTIAN STEIN

Icinga for Windows in the Monitoring of Madness (EN)

Abstract

Developing Icinga for Windows requires many different approaches, to ensure features are properly functioning with new versions of Windows as well as old ones, who support PowerShell by default. We want to give an insight on one hand, what kind of challenges developing Icinga for Windows faces, how sometimes user expectations, developer expectations and reality collide and what the new feature set of Icinga for Windows v1.11.0 ships with.

SLIDES

CHRISTIAN LORENZ

AutoHeilung mit Ansible (DE)

Abstract

Ein zentrales Monitoring ist ein Grundpfeiler der modernen IT Infrastruktur. Durch die Überwachung von Fehlermeldungen und besonderen Situationen reagieren wir schneller und die Kunden können schneller bei Ausfällen wieder arbeiten. Eine zentrale Automatisierung hilft den Admins beim Ausrollen von Konfigurationen und dem Neuaufbau von Infrastruktur. Ich möchte den Teilnehmern auf unsere Reise zu einer Automatischen Reparatur mit Ansible und AWX nehmen und dadurch aufzeigen, dass die Probleme dadurch schneller gelöst werden.

SLIDES

ALIAKSANDR VALIALKIN

VictoriaMetrics: scaling to 100 million metrics per second (EN)

Abstract

The growth of observability trends and Kubernetes adoption generates more demanding requirements for monitoring systems. Volumes of time series data increase exponentially, and old solutions just can’t keep up with the pace. The talk will cover how and why we created a new open source time series database from scratch. Which architectural decisions, which trade-offs we had to take in order to match the new expectations and handle 100 million metrics per second with VictoriaMetrics. The talk will be interesting for software engineers and DevOps familiar with observability and modern monitoring systems, or for those who’re interested in building scalable high performant databases for time series.

SLIDES

HILA FISH

Open Source: Open Choice – A DevOps Guide for OSS Adoption(EN)

Abstract

Choosing the right open source project to use can be quite challenging – not knowing if it’s going to be the right fit, how it will behave, and if you end up wasting time trying to make it all work. We’ve all been there. But what if I told you there’s a practical way to have a clear understanding of how to incorporate an OSS project in your environment? In this talk, I’m going to speak about the DevOps perspective on open source and the challenges Infra-focused engineers have with choosing the right project for their environment. As a DevOps Engineer, I’ve seen a lot of things, stumbled upon a lot of non-based decisions, and so will present practical advice on how to choose an OSS project for your dev/prod environment and will talk about the business mindset you should have to evaluate the key indicators based on your needs and specific pain points.

SLIDES

JULIEN PIVOTTO

What’s new in the Prometheus ecosystem? (EN)

Abstract

While Prometheus is now a mature monitoring solution, there is a lot of things going in the Prometheus ecosystem. Let’s have a look at some of the most important changes and novelties in the last few years, in the Prometheus community.

SLIDES

THOMAS GELF

VMware monitoring with ease (EN)

Abstract

Every release of the Icinga Module for vSphere® comes with a whole a bunch of new features. Over the time, vSphereDB has grown from a nice visualization Add-On to a grown-up full-blown monitoring component. This talk tries to shine light on some of it’s lesser-known features, showcases it’s use in various real-world scenarios and puts focus on leveraging monitoring-related components in an automated way.

SLIDES

GEETHA ANNE

IGNITE: Observability with Grafana & Prometheus for Kafka on Kubernetes (CFK) | (EN)

Abstract

Self-managing a highly scalable distributed system with Apache Kafka® at its core is not an easy feat. That’s why operators prefer tooling such as Confluent Control Center for administering and monitoring their deployments. However, sometimes, you might also like to import monitoring data into a third-party metrics aggregation platform for service correlations, consolidated dashboards, root cause analysis, or more fine-grained alerts. If you’ve ever asked a question along these lines: Can I export JMX data from Confluent clusters to my monitoring system with minimal configuration? What if I could correlate this service’s data spike with metrics from Confluent clusters in a single UI pane? Can I configure some Grafana dashboards for Confluent clusters?

This talk will enable you on achieving the below:
Monitoring Your Event Streams: Integrating Confluent with Prometheus and Grafana (this article)
Monitoring Your Event Streams: Tutorial for Observability Into Apache Kafka Clients

SLIDES

DANIEL BODKY

IGNITE: That’s nuts! A proof of concept of Icinga2 on Kubernetes using Acorn (EN)

Abstract

Icinga2 with its many moving parts, credentials for database, API, and other features, and layered topology can be considered a pretty complex system. When deploying to Kubernetes, it might in fact prove overwhelming when transitioning from ‘traditional’ setups. In this Ignite Talk, we will have a quick look at typical Icinga2 setups you might encounter in the wilds and how we can deploy them to Kubernetes clusters using acorn (https://acorn.io), a new deployment application framework for Kubernetes aimed at simplicity and developer experience. We will explore acorn’s basic idea(s), how it is configured and used in production, and where its shortcomings as of today might lie while going through the process of deploying a full-blown Icinga2 setup to a Kubernetes cluster.

SLIDES

PHILIPP KRENN

IGNITE: How to benchmark … poorly (EN)

Abstract

After getting a bit of a bad reputation (“benchmarketing”) it looks as if vendor benchmarks are very much in fashion again. Let’s take a quick look at common mistakes and how to do them … poorly.

SLIDES

JONAH KOWALL

Unifying Observability: Weaving Prometheus, Jaeger, and Open Source Together to Win (EN)

Abstract

Observability is a hugely popular topic, however, for open-source users, significant challenges remain. For starters, related licensing is frequently problematic—and even when it works, there is no pure Apache 2.0 licensed technology to get data collection and visibility into your logs, metrics, and traces. Thankfully, this is gradually changing as the community builds new capabilities into OpenSearch Dashboards to unify the visualization of logs from OpenSearch, metrics from PromQL compatible systems, and traces from Jaeger. In this session, we’ll examine how this important project is evolving as a fork of the previously popular ELK stack. We’ll also take a closer look at the current state of OpenSearch and Jaeger and discuss how these efforts are going to provide a foundation for unified observability to the open-source communities. By using OpenTelemetry for data collection, this foundation provides a pure Apache 2.0 licensed open-source platform for unified observability. OpenSearch also includes features like Alerting and Machine Learning, which are not part of Jaeger today. The work on this foundational integration is well underway and will provide open-source users with a solid alternative to vendor controlled and provided solutions. This also opens up the marketplace for solutions to be created to host and manage these at scale, something we’ve seen with countless other CNCF projects. This talk will be presented by a contributor and maintainer of OpenSearch, Jaeger, and OpenTelemetry, which are all vibrant user communities. Join the conversation!

SLIDES

JOSEPH KATSIOLOUDES

Security as Code: A DevSecOps Approach (EN)

Abstract

Security as Code (SaC) is the methodology of codifying security tests, scans, and policies. Security is implemented directly into the CI/CD pipeline to automatically and continuously detect security vulnerabilities. Adopting SaC tightly couples application development with security and vulnerability management, while simultaneously enabling developers to focus on core features and functionality. More importantly, it improves the collaboration between Development and Security teams and helps nurture a culture of security across the organization. In this session, we will
review lessons learned from DevOps to implement a thriving DevSecOps culture, in particular how we can make developers contribute security checks with the SaC approach. We will introduce CodeQL, a language that allows us to implement security checks with code. We will demo how we can code queries for vulnerabilities and misconfigurations so they can be identified as soon as they hit your CI/CD pipeline.

SLIDES

DOTAN HOROVITS

OpenTelemetry 101 (EN)

Abstract

Everyone wants observability into their system, but find themselves with too many vendors and tools, each with its own API, SDK, agent, and collectors. In this talk I will present OpenTelemetry, an ambitious open source project with the promise of a unified framework for collecting observability data. With OpenTelemetry you could instrument your application in a vendor-agnostic way, and then analyse the telemetry data in your backend tool of choice, whether Prometheus, Jaeger, Zipkin, or others. I will cover the current state of the various projects of OpenTelemetry (across programming languages, exporters, receivers, protocols), some of which not even GA yet, and provide useful guidance on how to get started with it.

SLIDES

DAVID FLANAGAN

Providing a Rich Interface to the Prometheus Operator (EN)

Abstract

Configuring Prometheus isn’t exactly on anyone’s top list of “fun nights in”, but it’s something we all need to do. We spend so much of our time slinging YAML that it can be easy to forget that there are alternatives to working with Kubernetes CRDs, so let’s spend some time together taking a look at an experience that puts the developer first. In this session, I will guide you through using Pulumi to author and deploy your own Prometheus rules and alerts with a great developer experience that provides intelligent code auto-completion right within your favourite IDE: whether you write in Go, Python, TypeScript, or event dotNet.

SLIDES

GEORGE HANTZARAS

Scaling SLOs with Kubernetes and Cloud Native Observability (EN)

Abstract

Defining Service Level Objectives and Service Level Indicators is a really important aspect of implementing SRE. Through service metrics (SLOs, SLIs, Error Budgets), SRE can help us measure our system’s performance and improve customer experience. They not only enable your teams to monitor and plan around reliability, but can also be early predictors of customer satisfaction, NPS, churn rates, and more. With the rise of cloud native technologies, it has become more and more relevant to automate our observability, extending it to an SLO-as-code model. In this session we’ll see how SLOs have evolved and can be used in a Cloud Native world. We’ll then explore how technologies like Kubernetes and Prometheus can help us scale SLOs, while promoting best practices and standards using Observability as code. Finally, we’ll see how to put all these together with Jenkins and Rancher, to operationalize error budgets.

SLIDES

PASCAL LANGE

Monitoring & Betriebsrat (DE)

Abstract

In vielen Betrieben Deutschlands gibt es Betriebsräte, die bei der Einführung neuer technischer Tools ein Mitbestimmungsrecht haben. Bisherige Gespräche mit IT-Kolleg*innen haben gezeigt, dass dieses Recht bei der Einführung nicht immer beachtet wird, was im Nachgang zu vermeidbaren Komplikationen und Streits führen kann. Diese Präsentation soll einen kurzen Einstieg in die Grundsätze der betriebsverfassungsrechtlichen Mitbestimmung sowie Eckpunkte mitgeben, an welchen Stellen beim Monitoring Betriebsräte einzubinden sein können.

SLIDES

BERND ERK

Current State of Icinga (EN)

Abstract

Current State of Icinga

SLIDES

DANIEL ZIEGLER

Einfaches, effizientes und schnelles Monitoring mit Open Source (DE)

Abstract

Keine Lust auf eingestaubte Konfigurationsdateien, Nagios und das irgendwie anpacken zu müssen? Bei openITCOCKPIT erfolgt die Konfiguration ausschließlich über die Weboberfläche.
Mit dem plattformunabhängigen Agenten ist das Basis-Monitoring in weniger als 2 Minuten eingerichtet. Die Kommunikation ist selbstverständlich verschlüsselt und danke Pull oder Push Modus lässt sich jedes System abfragen.
Freiwillige vor! Gerne überwache ich während der Demo euer Notebook.

SLIDES

SIMARPREET SINGH

Automated Incident Response for Cloud Native Risks (EN)

Abstract

Incident response teams are already drowning in alerts – and potentially are missing critical vulnerabilities. What usefulness there is to a security scanner which tells you there are thousands of vulnerabilities, but you need to take the time to go fix them? Extending visibility and responsibility to cloud native environments compounds this challenge faced by teams of weeding through huge volumes of alerts to determine which risks are the most urgent, and how best to respond to incidents. This session will cover how security teams can use open source projects to better identify high risk cloud native events, orchestrate responses with other third-party integrations based on these high-fidelity insights, and execute playbooks for more automated and effective incident analysis and handling processes. We will cover a variety of use cases ranging from simple cases such as acting upon CVE detections when performing vulnerability scans to more complex scenarios of runtime d etection. The session will focus on practical use case scenarios that are commonly observed in day-to-day situations.

SLIDES

DANIJEL TASOV

check_vsphere (EN)

Abstract

Currently the two dominant check scripts for VMware/Vsphere are check_vmware_esx and check_vmware_api. But they have a problem: they are based on the Perl SDK provided by VMware, which they have deprecated. You may already see problems when checking against newer VMware installations. So, we need a plugin in a different language. That’s why I started a new plugin written in Python using the Python SDK which is not deprecated. While doing this I also studied the check_vmware_esx/api and found out that they actually doing a very bad job with metrics based checks. Also, the VMware API is… well… let’s say over-engineered. So, this talk is about what I’ve learned during the implementation, what the current plugins do wrong, the overengineered VMware API and what the current state of the new plugin is.

SLIDES

EMIL-ANDREAS SIEMES

The Power of Metrics, Logs & Traces with Open Source (EN)

Abstract

The talk will show how organisations can drastically reduce their MTTR (Mean Time To Repair) by using, integrating & correlating the open source tools Mimir, Loki & Tempo. We will then take the next step into open source reliability testing to even avoid problems in the first place. And yes, we will use Grafana 🙂

SLIDES

HOLGER KOCH

In 60 Minuten zum IoT Projekt auf der Basis von ThingsBoard (DE)

Abstract

Im Talk wird anhand eines praktischen Beispiels eine Einführung in die Welt des “Internet of Things” gegeben. Angefangen von der Vorstellung einiger Sensorplattformen und der Programmierung eines Mikrocontrollers, über die Konnektivität mit NB-IoT bzw. LoraWan und AWS Core IoT bis zur Verarbeitung, Speicherung und Visualisierung der Daten auf der Basis der Open Source Software ThingsBoard werden alle Aspekte der IoT Welt beleuchtet. Somit erlebt der Zuhörer live, wie ein komplettes IoT Projekt entsteht.

SLIDES

PHILIPP KRENN

Logstash, Beats, Elastic Agent, Open Telemetry — what’s the right choice? (EN)

Abstract

Back in the old days with the ELK Stack, ingesting logs (and other data) was straight forward: Logstash or maybe Fluend. Today you have a lot more options: Beats have been around for a long time, but Elastic Agent is the hot new thing. And then there is also Open Telemetry that’s growing in use-cases. What’s the right choice? This talk gives a quick overview of the current options and their tradeoffs including some common scenarios and how one or more of the tools can solve your problems.

SLIDES

PRADEEP CHHETRI

Metrics Stream Processing Using Riemann (EN)

Abstract

This talk will cover:
Introduction to Riemann
* In-memory stream processing system written in clojure
* How it can easily process millions of events per second
* Configuration is clojure code

What problems Riemann solves in the today’s world of Prometheus & Datadog
* High cardinality metrics problem: Since all processing is done in*memory, riemann can handle high cardinality issue much better than prometheus.
* Instant detection: Since riemann uses websockets, an issue is instantly reflected into dashboard. In prometheus, it will only get reflected after next scrape.

Riemann concepts
* Event
* Stream
* Index
* Integrations

Riemann Stream Processing Engine
* Types of functions on riemann streams.
* Examples:
* Combine multiple streams into one stream.
* Split one stream into multiple streams.
* Filter, roll up, throttle, coalesce events.

Extending Riemann
* How riemann schema is extendable and use for streaming other events like logs.

Demo
* Using docker-compose setup of riemann to present the power of riemann and riemann-dash.

SLIDES

JULIEN PIVOTTO

IGNITE: The O11y toolkit (EN)

Abstract

The O11y toolkit is a set of utilities to help you maintain, debug, and augment your open source observability stack. Our tools will improve your experience with metrics, logs, and traces. We have a couple of tools published already. We want a seamless user experience across the tools, providing as much consistency in behaviour and packaging.

SLIDES

SEBASTIAN GUMPRICH

IGNITE: Event Driven Ansible (EN)

SLIDES

SVEN NIERLEIN

Thruk 3 – Monitoring at glance got a fresh look (EN)

Abstract

The Thruk web ui, capable of handling millions of services served well for many years. It still does, and even got more useful features. This talk will introduce the brand new Thruk 3 release coming with a completely reworked interface. There will be a little bit for everyone.

SLIDES

DANIEL UHLMANN

How we improved our monitoring so that everyone likes to be on-call (EN)

Abstract

Ever wonder why your Engineers don’t necessarily like being on call? There can be many different reasons for this, and one cause could be a poorly configured monitoring system. In this talk I would like to share with you the different stages we went through as a team to get from an inadequate monitoring to a solution that provides real value not only for the customer but also for us as a team.

SLIDES

COLIN DOUCH

Monitoring in a Serverless World (EN)

Abstract

Serverless applications are becoming a more and more attractive architecture of deploying software, providing many benefits over more traditional architectures. Unfortunately, due to the locked-in nature of Serverless platforms, monitoring options outside of vendor provided solutions are scarce. But does that mean we are doomed to be subject to Vendor lock-in? In this talk, Colin will cover the landscape of existing open source solutions to monitoring serverless applications, as well as present Cloudflares solution to how they monitor their increasingly large portfolio of serverless applications.

SLIDES

SATISH KARUNAKARAN

AI Driven Observability based on Open Source (EN)

Abstract

Observability & monitoring of resources are growing every day and it is inevitable to analyse all the data points to arrive at a solution. At Mercedes-Benz we have developed an open source Data Metric Analyzer and drive it with Data Science to identify Anomalies. As part of this talk, I / we would like to discuss about how we established the entire Data Processing Eco-System based on Open Source. Different technologies that would be talked about includes:

– Python: Data Science Components
– Airflow: Data Orchestration for metrics
– Telegraf: Data Collection
– TimescaleDB: Data Store for Timeseries Data
– Grafana + Streamlit: Visualization

SLIDES

OLENA KHARCHENKO & FRANCO SOLLNER

Vom Spam zum Mehrwert: Ganzheitliches APM und intelligentes Incident Management (DE)

Abstract

Wie nutze ich das Potenzial meines Monitoring-Systems in vollem Maße aus? Wie kreiere ich ein optimales Setup? Wie gehe ich mit Alarmfluten um? Auf diese Fragen gehen wir in unserer Session ein und zeigen dir, welche Erfahrungen wir im Bereich Application Performance und Incident Management gesammelt haben. Wir besprechen, welche Daten die Grundlage einer modernen Observability-Plattform sind und wie man mit Hilfe von Incident Management einen ganzheitlichen und nachhaltigen Prozess gestaltet.

SLIDES

KEVIN HONKA

Let’s build a private cloud – how hard can it be? (EN)

Abstract

When we built our private cloud with OpenStack, we never thought it would be this complex or time-consuming. In this talk I want to share our approach, the challenges we faced, and why we learned to appreciate good monitoring and log aggregation.

SLIDES

FEU MOUREK

Git Good – How knowing git can make your life easier (EN)

Abstract

In our tech life, it’s hard to avoid git. Pretty much everyone has to interact with it in a brief way, to make a commit at some point, or to review someones changes. Gits complexity can be get really intimidating and scary when you dig in deeper. But don’t let that scare you off, because once you understand what it is trying to do, it is a lot less daunting. And you will want to integrate it into every single one of your projects!

SLIDES

PASCAL FRIES

Monitoring multiple Kubernetes Clusters with Thanos (EN)

Abstract

By now, Prometheus has become the defacto standard for monitoring containerised applications. However, when it comes to monitoring multiple Kubernetes clusters through a single plane of glass, additional tools are required. In this talk, I will show how to setup a production monitoring landscape based on Prometheus and Thanos, spanning several Kubernetes clusters. Focussing on examples and best practices, I will also elaborate on how to secure communication between the individual components.