ALIAKSANDR VALIALKIN

Large-scale logging made easy

Abstract

Logging at scale is a common source of infrastructure expenses and frustration. While logging is something any organization does, there is still no silver bullet or just a simple and scalable solution without trade-offs. After studying the most popular logging systems, Aliaksandr came up with his design and vision of the problem. The proposed solution fits ideally for SRE, DevOps, and system engineers who need to provide logging solutions as a platform for the entire company or team. It accepts logs from existing logging agents, pipelines, and streams, efficiently stores them in a highly optimised log database and can be queried at lightning-fast speeds with perfect integration with tools like jq, awk, cut, etc.

ROCCO PEZZANI & THOMAS GELF

SNMP Monitoring at scale

Abstract
This presentation unveils a new addon component for the Icinga ecosystem, developed in collaboration with Würth Phoenix and with the support of Irideos. It enables distributed SNMP network monitoring for large segmented networks, capable of monitoring huge numbers of devices in near real-time, while displaying trend graphs for sensor and performance metrics.
An interactive MIB browser with distributed SNMP polling support and some nice network device visualization components will also be shown.

PHILIPP KRENN

Will ChatGPT Take Over My Job?

Abstract

Of course not. But besides the provocative title, ChatGPT and other LLMs are changing — hopefully improving — our industry.
Generative AI is massively (over-) hyped right now but the concept of copilots makes a lot of sense. We are the pilot but can rely on tooling to make our jobs faster and more efficient; like explaining a Kubernetes error message and listing the most common causes and potential solutions.
So what is the state of the art of current copilots and what is not yet or soon° to be expected?

° to the best of our current knowledge

MAKSIM NABOKIKH

Making your Kubernetes-based log collection reliable & durable with Vector

Abstract

Vector is an Open Source high-performance solution for collecting & processing your observability data. In this talk, I will share our experience using it for log collection in hundreds of Kubernetes clusters. Which features make Vector special, so we preferred it over other solutions? How did we deploy and setup it up in Kubernetes? What were our main challenges in adopting Vector, and how we overcame them? I will also summarise our lessons learned covering K8s logs collection in general.

CHRISTIAN STEIN

Icinga for Windows – Age of PowerShell

Abstract

Icinga for Windows has established itself as default for monitoring Windows environments with Icinga. With the upcoming release of version 1.12.0, there were massive changes introduced regarding usability, customizing and how to manage installations. In this talk we will cover the previous changes with v1.11.0 but also give an overview, of the features following with v1.12.0.

LENA STANDKE

DevOps Transformation: Introducing Incident Management and Maximizing Monitoring Value

Abstract

In this session we will talk about Incident Management and Incident Response topics as a part of DevOps transformation. You will learn how to get more value out of monitoring your services and how to train, lead and support your teams to improve their incident response and introduce on-call. We will share our first-hand experience and best practices gathered within two years with over 40 platform and product teams.

NICLAS ILG

IGNITE: Honeypot Flavors: Open-Source Honeypots and their Use in the Automotive Industry

Abstract

In this talk, we want to highlight the variety of open-source honeypots—decoy resources that mimic a valuable target system to entice adversaries—across multiple domains. Furthermore, we provide an outline of how existing open-source software can be used to tailor solutions for new and unique domains. As an example, we give a brief insight into our research on the use of open-source honeypots in the automotive environment, where most devices run on proprietary software.

DANIEL BODKY

IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) run away

Abstract

SL-something, error budgets, on-call shifts – SREs know it all, and many of us know SREs. But what’s the reality behind the job description first used at Google, and how do they operate? Are they glorified system administrators? DevOps folks gone platform engineers? Something entirely different? In this ignite, we will dip our toes in the waters of SRE, establishing a basic vocabulary and understanding, and take a look at how to (not) treat your SRE teams – because nobody likes a mutiny on their ships!

FELIX FRANK

IGNITE: Your business isn’t Green enough

Abstract

All your servers run on Green Energy. Your cloud provider plants a tree for any VM you spin up. Your employer bought emission certificates for you and all your colleagues last christmas. And yet, the climate crisis seems to keep escalating year by year. What else can we do? This ignite talk might be disturbing, but will also try to offer some insight into our options.

DAVE MCALLISTER

Know your data: The stats behind your alerts

Abstract

Quick, what’s the difference between the mean, the mode and the median? Do you need a Gaussian or a normal distribution And does your choice impact the alerts and observations you get from your observability tools?

Come get refreshed on the impact some basic choices in statistical behavior can have on what gets triggered. Learn why a median might be the choice for historical anomaly or sudden change. Jump into Gaussian distributions, data alignment challenges, and the trouble with sampling. Walk out with a deeper understanding of your metrics and what they might tell you.

DAVE KEMPE

Impact assessment with Netbox Path

Abstract

Modern networks are complicated beasts, with many environments serving as platforms to host applications of all levels of complexity. Netbox as a DCIM is great at documenting the truth of what infrastructure exists, and monitoring automation from Netbox is well understood. Less well understood is how applications use the infrastructure to get their jobs done, and Netbox Path is our attempt to document this, and then provide impact assessment. The result is you can know that this switch-port carries the payroll database traffic, or this virtual machine hosts DNS for this network. When problems happen, Netbox Path provides a way you can understand what applications are impacted by the problem, and notify the correct users. Live Demo included!

SEBASTIAN SCHUBERT

What’s new with Grafana Labs’s Open Source Observability stack

Abstract

Open source is at the heart of what we do at Grafana Labs and there is so much happening! The intent of this talk to update everyone on the latest development when it comes to Grafana, Pyroscope, Faro, Loki, Mimir, Tempo and more. Everyone has had at least heard about Grafana but maybe some of the other projects mentioned above are new to you? Welcome to this talk 😉 Beside the update what is new we will also quickly introduce them during this talk.

LUCAS COPI

Journey to observability: tracking every function execution in production

Abstract

This talk will discuss the technical journey converting a legacy monitoring approach to true production observability for the Ingress Team in IBM Cloud. A deep dive on the necessary code and culture changes that enabled our team to observe the performance of every single function executed in production. And how these changes enabled a 90% reduction in incident volume and downtime duration. The goal of this talk is to provide insight into the challenges, pitfalls, and successes of the IBM Cloud team, and generate a discussion on what is next.

LARS MICHELSEN

Newest developments in Checkmk Raw – the open-source monitoring solution

Abstract

The pace of change in IT continues to accelerate. At Checkmk, we believe the solution for all the upcoming challenges in IT monitoring is an open observability ecosystem: built on an involved user community, reliable partners, many off-the-shelf integrations, and a wide range of APIs for development, configuration and operations. Join us to learn about the latest release of our open source monitoring solution Checkmk Raw. Among the highlights are our completely reworked Grafana integration, performance boost for the InfluxDB integration, 150+ new or improved checks including Google Cloud Platform, Mobile Devices Monitoring, Cisco Meraki. Furthermore, we have news about sharing monitoring customizations, an extended REST API, and an enhanced user experience. What makes Checkmk 2.2 truly unique is our community-driven approach. We have harnessed the expertise of numerous individuals to build a monitoring solution that caters to universal needs. In our talk, we will demonstrate the power of collaboration and how it has shaped our platform. We will also showcase how the Checkmk open-source ecosystem thrives, highlighting some new interesting projects.

HOLGER KOCH

Bring IoT auf ein neues Level mit ThingsBoard

Abstract

Das Internet der Dinge erfreut sich immer größerer Beliebtheit. Dabei setzen aufgrund der Einfachheit, Skalierbarkeit und Funktionsvielfalt immer mehr Unternehmen auf die Open Source IoT Plattform ThingsBoard. Im ersten Drittel des Talk wird eine praktische Einführung in die Plattform gegeben. Anschließend werden wir uns zusammen eine skalierende Architektur anschauen, mit der hunderttausende Sensoren mit Millionen Metriken performant verarbeitet werden können. Im letzten Drittel werden wir uns die Möglichkeiten zur Visualisierung der gewonnenen Daten, die Anomalie-Erkennung und verschiedene Auswertungen auf der Basis von ThingsBoard Trendz anschauen. Somit erhält der Zuhörer einen kompletten Einstieg in die umfangreiche Funktionalität von ThingsBoard.

BERND ERK

Current State of Icinga

Abstract

Current State of Icinga.

PHILIPP KRENN

OpenTelemetry for Logging

Abstract

While OpenTelemetry tracing is the de-facto standard today and metrics are also getting more and more established, logging is still lagging far behind.
This talk gives an overview of:

– Why should you change your logging to work with OpenTelemetry?
– Where is the OTel standard in terms of logging?
– How can you use it in your application today?

We will focus on Java with three different approaches for the implementation. But most of the concepts will translate well to other ecosystems as well.

SEBASTIAN GUMPRICH

Automated update management with Renovate

Abstract

Minimize security risks and keep systems up-to-date

Heise again reports a serious security vulnerability in widely used software, remember Heartbleed or log4j. The search for the affected software and its dependencies begins. Lack of overview and uncertainty often lead to postponing necessary updates. To be proactive and keep systems up-to-date, we use Renovate. In this presentation, I will show how we use it company-wide.

JOCHEN KRESSIN & LEANNE LACEY-BYRNE

Experiments with OpenSearch and AI

Abstract

At the intersection of search and AI, melding Large Language Models (LLMs) with OpenSearch opens transformative avenues. In this talk, we explore how LLMs can simplify the interaction between users and OpenSearch, converting natural language into OpenSearch queries. We will also leverage OpenSearch’s Vector Storage, enriching traditional term-based searches with semantic understanding. Dive into a future where search engines transcend being mere tools, becoming intuitive partners in knowledge discovery.

SVEN NIERLEIN

Replacing NSClient++ for Windows Monitoring

Abstract

This talk will give a quick overview on nsclient alternatives and will introduce the new SNClient+ agent for Windows, Linux, OSX and BSD. This new agent is designed to replace the nsclient without having to migrate configuration or scripts. Besides this compatibility mode, i will show what else can be done with the snclient, ex.: fetching prometheus metrics.

SEBASTIAN SCHUBERT

Running the Infra at FOSDEM

Abstract
FOSDEM is one of the largest OpenSource conferences in the world, ran completely by volunteers without charging any fees.
Learn how we run the Conference, while avoiding proprietary solutions.

NICOLAS SCHNEIDER

Extending Icinga Web with Modules: powerful, smart and easily created

Abstract
Nicolas previously presented icingaweb2-module-scaffoldbuilder at icinga camp berlin and would like to present the modules he build so far using the scaffoldbuilder or forking other modules.

DAVE MCALLISTER

IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Unit

Abstract
WebAssembly (Wasm) promises to change the nature of web applications. However, we should take a look at the needs of the apps decoupled from the browser.
Enter open source NGINX Unit. Unit is an application runtime for web apps and APIs. It handles the HTTP(S) front end, request routing and serving of assets, including the hand-off of dynamic requests. In short, Unit decouples the HTTP server from the application process. And it’s an excellent fit for Wasm’s sandboxed execution and linear memory byte streams.

DAVE KEMPE

IGNITE: A few cool new Icinga Notification scripts

Abstract

Sol1 is releasing some new Icinga notifications scripts and this talk goes through some of their cooler features.

  • Slack – customise-able sizing, layout and includes
  • Request Tracker – idempotent ticket creation
  • Impact assessment with path – Check who cares about that thing
  • Enhanced Email – embed grafana graphs, HTML, and Netbox data in emails!

Now with all new shiny director baskets. Integration with Netbox Contacts is also demoed, so we can let users self serve which devices to get notifications for and how/why all from netbox!

MARC ZIMMERMANN

IGNITE: Wie mich meine Katzen bei der Arbeit unterstützen

Abstract

Warum meine Katzen super Begleiter sind und wie sie mich bei der Arbeit unterstützen.

MATTHIAS GALLINGER & TOBIAS KEMPF

Monitoring at one of the largest retail groups in the world

Abstract

This talk will focus on the monitoring of the Schwarz Group, one of the largest retailers in the world, with an extremely large and complex environment, using the OMD platform. OMD is an all-in-one monitoring solution that integrates Naemon/Thruk for host and service monitoring, Prometheus for metrics collection, and Grafana for data visualization. The monitoring extends from classic infrastructure, such as servers and networks, to cloud-based applications and even to the business operations in stores. This comprehensive monitoring approach provides necessary information and data for each client group, allowing the Schwarz Group to optimize its operations and enhance customer experience. We will discuss the benefits of using OMD and its tools in such a large environment, including its scalability, flexibility, and ease of use. We will also cover some of the challenges faced during the lifecycle of the monitoring solution. Overall, this talk will provide valuable insights into the power of OMD for monitoring large-scale and complex infrastructures, and how it can help organizations like the Schwarz Group improve their operations, maintain their systems in top condition, and enhance their customer experience.

JENS MICHELSONS

openITCOCKPIT Community Edition – Einfache Konfiguration, Module, API und mehr

Abstract

openITCOCKPIT ist ein äußerst anpassungsfähiges IT-Monitoring-Framework, das auf Naemon und anderen Technologien aus der Open-Source-Welt basiert. Es zeichnet sich durch eine einfache Konfiguration und eine hohe Flexibilität aus, wodurch jeder in der Lage ist, auch ohne umfassende Linux-Kenntnisse eine vollständige Infrastruktur- und Applikationsüberwachung aufzubauen. Die Flexibilität des Systems bietet jedoch auch erfahrenen Monitoring-Profis zahlreiche Möglichkeiten. Darüber hinaus verfügt die Community Edition über viele weitere Funktionen wie Eventkorrelation, verschiedene Reporting-Funktionen, Integration des Checkmk Agents, konfigurierbare Dashboards, eine einfach zu konfigurierende Grafana-Integration, eine offene REST-API und vieles mehr. Dies macht es zu einem umfassenden Monitoring-Framework, das weit über Standardmonitoring Funktionen hinausgeht. In diesem Vortrag wird nicht nur gezeigt, wie einfach es ist, eine IT-Infrastruktur- und Applikationsüberwachung aufzubauen, sondern auch, wie die einzelnen Module effektiv eingesetzt werden können, um z.B. Informationen zu korrelieren.

GREG KALAPOS

Built-in OpenTelemetry support in Elasticsearch clients

Abstract

How we instrumented our libraries with OTel and what you can learn from it

At Elastic, we recently added OpenTelemetry support to most of our OpenSource Elasticsearch clients. This talk will tell the story on how we got there and what we learned along the way.
Elasticsearch clients exist in multiple languages (Java, .NET, PHP, Ruby, etc.), therefore we also created Semantic Conventions to make sure all Elasticsearch client instrumentations behave in the same way.
Attendees will learn about how to instrument existing libraries with OpenTelemetry and they will also learn how to interact with the community and collaborate on creating Semantic Conventions for specific technologies.

COSIMO RUSSO & FABIAN BINDER

Automatische Systemupdates as a Self-Service

Abstract

Bei Systemupdates hat man mit unterschiedlichsten Update-Zeitplänen und Verantwortlichkeiten für Systeme zu kämpfen.
comNET stellt sich der Herausforderung und automatisiert Updateprozesse in einem kontrollierten Verfahren – mit Checkmk, NetBox und Ansible.
Durch die Verknüpfung von Hostdaten aus NetBox mit Monitoring-Daten aus Checkmk wird sichergestellt, dass ein System bereit für Updates ist. Ansible erledigt den Rest.
So können User ihre Systeme einfach selbst updaten – mit nur einem Klick!

WOLFGANG ALPER

Zabbix – Powerful enterprise grade monitoring driven by Open Source

Abstract

When it comes to enterprise-level and open source network monitoring, one of the products that comes to mind is Zabbix. The tool, which has been continuously developed and improved for over 20 years, is a great choice for monitoring network devices, servers, applications, container, and cloud environments. It combines powerful monitoring capabilities with easy-to-use configuration and visualization options. This presentation will give a brief overview of its design, capabilities and key features that make it so special.

LUKAS MATECKI

Das Experiment: KI nutzen um RestAPI`s zu Verarbeiten und Icinga Plugins dafür zu bauen!

Abstract

Sind Sie schon jemals in eine Situation geraten in der Sie einen Check brauchen den es noch nicht gibt ?

Bei mir war das der Fall mit Rancher – Einer Kubernetes On-Prem Lösung.
Nun ist Icinga nicht so gut dynamische Workloads zu überwachen, aber es gibt ja auch Deploys die bestehen bleiben (Produktion).
Wie kann ich diese also in mein Firmenweites Monitoring integrieren ?
Anhand der RestAPI und Chatgpt habe ich versucht ein auf meine Bedürfnisse passendes Addon zu schreiben.

BIROL YILDIZ

Elevating Open-Source Monitoring Ecosystems

Abstract

Bridging the Gap from Alert Detection to Effective Incident Response

While open-source monitoring tools are indispensable in modern IT environments, organizations often grapple with the transition from anomaly detection to swift, informed action. This presentation delves deep into this challenge, offering a step-by-step guide to embedding robust incident response practices seamlessly for DevOps teams. We will dissect the incident response workflow into four tangible stages: Preparation, Response, Communication, and Learning. Attendees will leave with a practical roadmap to minimize user impact, reinforce a culture of continuous growth, and enhance their existing open-source monitoring frameworks.