Large-scale logging made easy
Logging at scale is a common source of infrastructure expenses and frustration. While logging is something any organization does, there is still no silver bullet or just a simple and scalable solution without trade-offs. After studying the most popular logging systems, Aliaksandr came up with his design and vision of the problem. The proposed solution fits ideally for SRE, DevOps, and system engineers who need to provide logging solutions as a platform for the entire company or team. It accepts logs from existing logging agents, pipelines, and streams, efficiently stores them in a highly optimised log database and can be queried at lightning-fast speeds with perfect integration with tools like jq, awk, cut, etc.
ROCCO PEZZANI & THOMAS GELF
SNMP Monitoring at scale
Will ChatGPT Take Over My Job?
Of course not. But besides the provocative title, ChatGPT and other LLMs are changing — hopefully improving — our industry.
Generative AI is massively (over-) hyped right now but the concept of copilots makes a lot of sense. We are the pilot but can rely on tooling to make our jobs faster and more efficient; like explaining a Kubernetes error message and listing the most common causes and potential solutions.
So what is the state of the art of current copilots and what is not yet or soon° to be expected?
° to the best of our current knowledge
Making your Kubernetes-based log collection reliable & durable with Vector
Vector is an Open Source high-performance solution for collecting & processing your observability data. In this talk, I will share our experience using it for log collection in hundreds of Kubernetes clusters. Which features make Vector special, so we preferred it over other solutions? How did we deploy and setup it up in Kubernetes? What were our main challenges in adopting Vector, and how we overcame them? I will also summarise our lessons learned covering K8s logs collection in general.
Icinga for Windows – Age of PowerShell
Icinga for Windows has established itself as default for monitoring Windows environments with Icinga. With the upcoming release of version 1.12.0, there were massive changes introduced regarding usability, customizing and how to manage installations. In this talk we will cover the previous changes with v1.11.0 but also give an overview, of the features following with v1.12.0.
DevOps Transformation: Introducing Incident Management and Maximizing Monitoring Value
In this session we will talk about Incident Management and Incident Response topics as a part of DevOps transformation. You will learn how to get more value out of monitoring your services and how to train, lead and support your teams to improve their incident response and introduce on-call. We will share our first-hand experience and best practices gathered within two years with over 40 platform and product teams.
IGNITE: Honeypot Flavors: Open-Source Honeypots and their Use in the Automotive Industry
In this talk, we want to highlight the variety of open-source honeypots—decoy resources that mimic a valuable target system to entice adversaries—across multiple domains. Furthermore, we provide an outline of how existing open-source software can be used to tailor solutions for new and unique domains. As an example, we give a brief insight into our research on the use of open-source honeypots in the automotive environment, where most devices run on proprietary software.
IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) run away
SL-something, error budgets, on-call shifts – SREs know it all, and many of us know SREs. But what’s the reality behind the job description first used at Google, and how do they operate? Are they glorified system administrators? DevOps folks gone platform engineers? Something entirely different? In this ignite, we will dip our toes in the waters of SRE, establishing a basic vocabulary and understanding, and take a look at how to (not) treat your SRE teams – because nobody likes a mutiny on their ships!
IGNITE: Your business isn’t Green enough
All your servers run on Green Energy. Your cloud provider plants a tree for any VM you spin up. Your employer bought emission certificates for you and all your colleagues last christmas. And yet, the climate crisis seems to keep escalating year by year. What else can we do? This ignite talk might be disturbing, but will also try to offer some insight into our options.
Know your data: The stats behind your alerts
Quick, what’s the difference between the mean, the mode and the median? Do you need a Gaussian or a normal distribution And does your choice impact the alerts and observations you get from your observability tools?
Come get refreshed on the impact some basic choices in statistical behavior can have on what gets triggered. Learn why a median might be the choice for historical anomaly or sudden change. Jump into Gaussian distributions, data alignment challenges, and the trouble with sampling. Walk out with a deeper understanding of your metrics and what they might tell you.
Impact assessment with Netbox Path
Modern networks are complicated beasts, with many environments serving as platforms to host applications of all levels of complexity. Netbox as a DCIM is great at documenting the truth of what infrastructure exists, and monitoring automation from Netbox is well understood. Less well understood is how applications use the infrastructure to get their jobs done, and Netbox Path is our attempt to document this, and then provide impact assessment. The result is you can know that this switch-port carries the payroll database traffic, or this virtual machine hosts DNS for this network. When problems happen, Netbox Path provides a way you can understand what applications are impacted by the problem, and notify the correct users. Live Demo included!
What’s new with Grafana Labs’s Open Source Observability stack
Open source is at the heart of what we do at Grafana Labs and there is so much happening! The intent of this talk to update everyone on the latest development when it comes to Grafana, Pyroscope, Faro, Loki, Mimir, Tempo and more. Everyone has had at least heard about Grafana but maybe some of the other projects mentioned above are new to you? Welcome to this talk 😉 Beside the update what is new we will also quickly introduce them during this talk.
Journey to observability: tracking every function execution in production
This talk will discuss the technical journey converting a legacy monitoring approach to true production observability for the Ingress Team in IBM Cloud. A deep dive on the necessary code and culture changes that enabled our team to observe the performance of every single function executed in production. And how these changes enabled a 90% reduction in incident volume and downtime duration. The goal of this talk is to provide insight into the challenges, pitfalls, and successes of the IBM Cloud team, and generate a discussion on what is next.
Newest developments in Checkmk Raw – the open-source monitoring solution
The pace of change in IT continues to accelerate. At Checkmk, we believe the solution for all the upcoming challenges in IT monitoring is an open observability ecosystem: built on an involved user community, reliable partners, many off-the-shelf integrations, and a wide range of APIs for development, configuration and operations. Join us to learn about the latest release of our open source monitoring solution Checkmk Raw. Among the highlights are our completely reworked Grafana integration, performance boost for the InfluxDB integration, 150+ new or improved checks including Google Cloud Platform, Mobile Devices Monitoring, Cisco Meraki. Furthermore, we have news about sharing monitoring customizations, an extended REST API, and an enhanced user experience. What makes Checkmk 2.2 truly unique is our community-driven approach. We have harnessed the expertise of numerous individuals to build a monitoring solution that caters to universal needs. In our talk, we will demonstrate the power of collaboration and how it has shaped our platform. We will also showcase how the Checkmk open-source ecosystem thrives, highlighting some new interesting projects.
Bring IoT auf ein neues Level mit ThingsBoard
Das Internet der Dinge erfreut sich immer größerer Beliebtheit. Dabei setzen aufgrund der Einfachheit, Skalierbarkeit und Funktionsvielfalt immer mehr Unternehmen auf die Open Source IoT Plattform ThingsBoard. Im ersten Drittel des Talk wird eine praktische Einführung in die Plattform gegeben. Anschließend werden wir uns zusammen eine skalierende Architektur anschauen, mit der hunderttausende Sensoren mit Millionen Metriken performant verarbeitet werden können. Im letzten Drittel werden wir uns die Möglichkeiten zur Visualisierung der gewonnenen Daten, die Anomalie-Erkennung und verschiedene Auswertungen auf der Basis von ThingsBoard Trendz anschauen. Somit erhält der Zuhörer einen kompletten Einstieg in die umfangreiche Funktionalität von ThingsBoard.
OpenTelemetry for Logging
While OpenTelemetry tracing is the de-facto standard today and metrics are also getting more and more established, logging is still lagging far behind.
This talk gives an overview of:
– Why should you change your logging to work with OpenTelemetry?
– Where is the OTel standard in terms of logging?
– How can you use it in your application today?
We will focus on Java with three different approaches for the implementation. But most of the concepts will translate well to other ecosystems as well.
Automated update management with Renovate
Minimize security risks and keep systems up-to-date
Heise again reports a serious security vulnerability in widely used software, remember Heartbleed or log4j. The search for the affected software and its dependencies begins. Lack of overview and uncertainty often lead to postponing necessary updates. To be proactive and keep systems up-to-date, we use Renovate. In this presentation, I will show how we use it company-wide.
JOCHEN KRESSIN & LEANNE LACEY-BYRNE
Experiments with OpenSearch and AI
At the intersection of search and AI, melding Large Language Models (LLMs) with OpenSearch opens transformative avenues. In this talk, we explore how LLMs can simplify the interaction between users and OpenSearch, converting natural language into OpenSearch queries. We will also leverage OpenSearch’s Vector Storage, enriching traditional term-based searches with semantic understanding. Dive into a future where search engines transcend being mere tools, becoming intuitive partners in knowledge discovery.
Replacing NSClient++ for Windows Monitoring
This talk will give a quick overview on nsclient alternatives and will introduce the new SNClient+ agent for Windows, Linux, OSX and BSD. This new agent is designed to replace the nsclient without having to migrate configuration or scripts. Besides this compatibility mode, i will show what else can be done with the snclient, ex.: fetching prometheus metrics.
IGNITE: Serving Server-Side WASM with Web Awareness with NGINX Unit
Enter open source NGINX Unit. Unit is an application runtime for web apps and APIs. It handles the HTTP(S) front end, request routing and serving of assets, including the hand-off of dynamic requests. In short, Unit decouples the HTTP server from the application process. And it’s an excellent fit for Wasm’s sandboxed execution and linear memory byte streams.
IGNITE: A few cool new Icinga Notification scripts
Sol1 is releasing some new Icinga notifications scripts and this talk goes through some of their cooler features.
- Slack – customise-able sizing, layout and includes
- Request Tracker – idempotent ticket creation
- Impact assessment with path – Check who cares about that thing
- Enhanced Email – embed grafana graphs, HTML, and Netbox data in emails!
Now with all new shiny director baskets. Integration with Netbox Contacts is also demoed, so we can let users self serve which devices to get notifications for and how/why all from netbox!
MATTHIAS GALLINGER & TOBIAS KEMPF
Monitoring at one of the largest retail groups in the world
This talk will focus on the monitoring of the Schwarz Group, one of the largest retailers in the world, with an extremely large and complex environment, using the OMD platform. OMD is an all-in-one monitoring solution that integrates Naemon/Thruk for host and service monitoring, Prometheus for metrics collection, and Grafana for data visualization. The monitoring extends from classic infrastructure, such as servers and networks, to cloud-based applications and even to the business operations in stores. This comprehensive monitoring approach provides necessary information and data for each client group, allowing the Schwarz Group to optimize its operations and enhance customer experience. We will discuss the benefits of using OMD and its tools in such a large environment, including its scalability, flexibility, and ease of use. We will also cover some of the challenges faced during the lifecycle of the monitoring solution. Overall, this talk will provide valuable insights into the power of OMD for monitoring large-scale and complex infrastructures, and how it can help organizations like the Schwarz Group improve their operations, maintain their systems in top condition, and enhance their customer experience.
openITCOCKPIT Community Edition – Einfache Konfiguration, Module, API und mehr
openITCOCKPIT ist ein äußerst anpassungsfähiges IT-Monitoring-Framework, das auf Naemon und anderen Technologien aus der Open-Source-Welt basiert. Es zeichnet sich durch eine einfache Konfiguration und eine hohe Flexibilität aus, wodurch jeder in der Lage ist, auch ohne umfassende Linux-Kenntnisse eine vollständige Infrastruktur- und Applikationsüberwachung aufzubauen. Die Flexibilität des Systems bietet jedoch auch erfahrenen Monitoring-Profis zahlreiche Möglichkeiten. Darüber hinaus verfügt die Community Edition über viele weitere Funktionen wie Eventkorrelation, verschiedene Reporting-Funktionen, Integration des Checkmk Agents, konfigurierbare Dashboards, eine einfach zu konfigurierende Grafana-Integration, eine offene REST-API und vieles mehr. Dies macht es zu einem umfassenden Monitoring-Framework, das weit über Standardmonitoring Funktionen hinausgeht. In diesem Vortrag wird nicht nur gezeigt, wie einfach es ist, eine IT-Infrastruktur- und Applikationsüberwachung aufzubauen, sondern auch, wie die einzelnen Module effektiv eingesetzt werden können, um z.B. Informationen zu korrelieren.
Built-in OpenTelemetry support in Elasticsearch clients
How we instrumented our libraries with OTel and what you can learn from it
At Elastic, we recently added OpenTelemetry support to most of our OpenSource Elasticsearch clients. This talk will tell the story on how we got there and what we learned along the way.
Elasticsearch clients exist in multiple languages (Java, .NET, PHP, Ruby, etc.), therefore we also created Semantic Conventions to make sure all Elasticsearch client instrumentations behave in the same way.
Attendees will learn about how to instrument existing libraries with OpenTelemetry and they will also learn how to interact with the community and collaborate on creating Semantic Conventions for specific technologies.
COSIMO RUSSO & FABIAN BINDER
Automatische Systemupdates as a Self-Service
Bei Systemupdates hat man mit unterschiedlichsten Update-Zeitplänen und Verantwortlichkeiten für Systeme zu kämpfen.
comNET stellt sich der Herausforderung und automatisiert Updateprozesse in einem kontrollierten Verfahren – mit Checkmk, NetBox und Ansible.
Durch die Verknüpfung von Hostdaten aus NetBox mit Monitoring-Daten aus Checkmk wird sichergestellt, dass ein System bereit für Updates ist. Ansible erledigt den Rest.
So können User ihre Systeme einfach selbst updaten – mit nur einem Klick!
Zabbix – Powerful enterprise grade monitoring driven by Open Source
When it comes to enterprise-level and open source network monitoring, one of the products that comes to mind is Zabbix. The tool, which has been continuously developed and improved for over 20 years, is a great choice for monitoring network devices, servers, applications, container, and cloud environments. It combines powerful monitoring capabilities with easy-to-use configuration and visualization options. This presentation will give a brief overview of its design, capabilities and key features that make it so special.
Das Experiment: KI nutzen um RestAPI`s zu Verarbeiten und Icinga Plugins dafür zu bauen!
Sind Sie schon jemals in eine Situation geraten in der Sie einen Check brauchen den es noch nicht gibt ?
Bei mir war das der Fall mit Rancher – Einer Kubernetes On-Prem Lösung.
Nun ist Icinga nicht so gut dynamische Workloads zu überwachen, aber es gibt ja auch Deploys die bestehen bleiben (Produktion).
Wie kann ich diese also in mein Firmenweites Monitoring integrieren ?
Anhand der RestAPI und Chatgpt habe ich versucht ein auf meine Bedürfnisse passendes Addon zu schreiben.
Elevating Open-Source Monitoring Ecosystems
Bridging the Gap from Alert Detection to Effective Incident Response
While open-source monitoring tools are indispensable in modern IT environments, organizations often grapple with the transition from anomaly detection to swift, informed action. This presentation delves deep into this challenge, offering a step-by-step guide to embedding robust incident response practices seamlessly for DevOps teams. We will dissect the incident response workflow into four tangible stages: Preparation, Response, Communication, and Learning. Attendees will leave with a practical roadmap to minimize user impact, reinforce a culture of continuous growth, and enhance their existing open-source monitoring frameworks.