mirror of
https://github.com/nubenetes/awesome-kubernetes.git
synced 2026-05-26 19:18:58 +00:00
49 KiB
49 KiB
Monitoring and Performance. Prometheus, Grafana, APMs and more
!!! info "Architectural Context" Detailed reference for Monitoring and Performance. Prometheus, Grafana, APMs and more in the context of Architectural Foundations.
Standard Reference
- Monitoring jenkins using instana [COMMUNITY-TOOL]
- victoriametrics.com: Q2 2024 Round Up: VictoriaMetrics & VictoriaLogs Updates [COMMUNITY-TOOL]
- thenewstack.io: The Challenges of Monitoring Kubernetes and OpenShift [COMMUNITY-TOOL]
- sysdig.com: Kubernetes Monitoring with Prometheus, the ultimate guide 🌟 [COMMUNITY-TOOL]
- sysdig.com: How to monitor kube-proxy 🌟 [COMMUNITY-TOOL]
- getenroute.io: TSDB, Prometheus, Grafana In Kubernetes: Tracing A Variable Across The OSS Monitoring Stack [COMMUNITY-TOOL]
- opsdis.com: Building a custom monitoring solution with Grafana, Prometheus and Loki [COMMUNITY-TOOL]
- harness.io: Metrics to Improve Continuous Integration Performance [COMMUNITY-TOOL]
- skilledfield.com.au: Monitoring Kubernetes and Docker Container Logs [COMMUNITY-TOOL]
- forbes.com: Who Should Own The Job Of Observability In DevOps? [COMMUNITY-TOOL]
- infoworld.com: The RED method: A new strategy for monitoring microservices [COMMUNITY-TOOL]
- forbes.com: From Data Collection To Delivering KPIs: A Roadmap To A Mature Observability Strategy [COMMUNITY-TOOL] — - The observability space is a multi-billion-dollar market and for good reason—there are a lot of benefits when you implement a robust observability strategy. But extracting value is not as simple as adopting a tool, throwing your data into a black box and expecting it to spit out business-relevant, contextualized insights and helpful visualizations.
- As they say, “Nothing good comes easy”—but when done right, a mature observability strategy will pay for itself over and over again.
- KPIs [COMMUNITY-TOOL]
- Promster: Use Prometheus in huge deployments with dynamic clustering and scrape sharding capabilities based on ETCD service registration ⭐ 31 [COMMUNITY-TOOL]
- timescale.com: Prometheus vs. OpenTelemetry Metrics: A Complete Guide [COMMUNITY-TOOL]
- acloudguru.com: Getting started with the Elastic Stack [COMMUNITY-TOOL]
- aws.amazon.com: Amazon Elasticsearch Service Is Now Amazon OpenSearch Service and Supports OpenSearch 1.0 [COMMUNITY-TOOL]
- Transitive blocks [COMMUNITY-TOOL] — - Unresponsive JVM
- Remote Debugging of Java Applications on OpenShift [COMMUNITY-TOOL]
- Dapper [COMMUNITY-TOOL]
- newrelic.com: OpenTracing, OpenCensus, OpenTelemetry, and New Relic (Best overview of OpenTelemetry) [COMMUNITY-TOOL]
- sentry.io [COMMUNITY-TOOL] — - APMs like Dynatrace, etc.
- Elastic APM [COMMUNITY-TOOL]
- Elastic APM Server [COMMUNITY-TOOL]
- adictosaltrabajo.com: Monitorización y análisis de rendimiento de aplicaciones con Dynatrace APM [COMMUNITY-TOOL]
- dynatrace.com: openshift monitoring [COMMUNITY-TOOL]
- dynatrace.com: Deploy OneAgent on OpenShift Container Platform [COMMUNITY-TOOL]
- My Dynatrace proof of concept 🌟 ⭐ 661 [COMMUNITY-TOOL]
- dynatrace.com: Monitoring of Kubernetes Infrastructure for day 2 operations [COMMUNITY-TOOL]
- dynatrace.com: The Power of OpenShift, The Visibility of Dynatrace [COMMUNITY-TOOL]
- A Distributed Tracing Adventure in Apache Beam [COMMUNITY-TOOL]
- rancher.com: Monitor Etcd with Prometheus and Grafana using Rancher [COMMUNITY-TOOL]
- openshift.com: Monitoring Infrastructure Openshift 4.x Using Zabbix Operator [COMMUNITY-TOOL]
- openshift.com: How to Monitor Openshift 4.x with Zabbix using Prometheus - Part 2 [COMMUNITY-TOOL]
- replex.io [COMMUNITY-TOOL]
- Micrometer Collector [COMMUNITY-TOOL]
- en.wikipedia.org/wiki/List_of_performance_analysis_tools [COMMUNITY-TOOL]
- InspectIT [COMMUNITY-TOOL]
- VisualVM [COMMUNITY-TOOL]
- OverOps [COMMUNITY-TOOL]
- Dzone: 14 Best Performance Testing Tools and APM Solutions [COMMUNITY-TOOL]
- victoriametrics.com [COMMUNITY-TOOL]
- Monitor your Azure cloud estate - Cloud Adoption Framework [COMMUNITY-TOOL]
- Wikipedia: Application Performance Index [COMMUNITY-TOOL]
- Observability vs Monitoring [COMMUNITY-TOOL]
- dzone.com: Kubernetes Monitoring: Best Practices, Methods, and Existing' Solutions [COMMUNITY-TOOL]
- CNCF End User Technology Radar: Observability, September 2020 🌟 [COMMUNITY-TOOL]
- magalix.com: Monitoring Kubernetes Clusters Through Prometheus & Grafana' 🌟 [COMMUNITY-TOOL]
- learnsteps.com: Monitoring Infrastructure System Design [COMMUNITY-TOOL]
- bravenewgeek.com: The Observability Pipeline [COMMUNITY-TOOL]
- thenewstack.io: 3 Key Configuration Challenges for Kubernetes Monitoring' with Prometheus [COMMUNITY-TOOL]
- thenewstack.io: Monitoring vs. Observability: What’s the Difference? [COMMUNITY-TOOL]
- dashbird.io: Monitoring vs Observability: Can you tell the difference? 🌟 [COMMUNITY-TOOL]
- thenewstack.io: Monitoring as Code: What It Is and Why You Need It 🌟 [COMMUNITY-TOOL]
- thenewstack.io: Observability Won’t Replace Monitoring (Because It Shouldn’t)' 🌟 [COMMUNITY-TOOL]
- matiasmct.medium.com: Observability at Scale [COMMUNITY-TOOL]
- logz.io: Top 11 Open Source Monitoring Tools for Kubernetes 🌟 [COMMUNITY-TOOL]
- thenewstack.io: Kubernetes Observability Challenges in Cloud Native Architecture' 🌟 [COMMUNITY-TOOL]
- thenewstack.io: Best Practices to Optimize Infrastructure Monitoring within' DevOps Teams [COMMUNITY-TOOL]
- faun.pub: DevOps Meets Observability 🌟 [COMMUNITY-TOOL]
- thenewstack.io: Growing Adoption of Observability Powers Business Transformation [COMMUNITY-TOOL]
- blog.thundra.io: What CI Observability Means for DevOps 🌟 [COMMUNITY-TOOL]
- thenewstack.io: Monitoring API Latencies After Releases: 4 Mistakes to Avoid [COMMUNITY-TOOL]
- thenewstack.io: DevOps Observability from Code to Cloud [COMMUNITY-TOOL]
- thenewstack.io: CI Observability for Effective Change Management 🌟 [COMMUNITY-TOOL]
- thenewstack.io: Monitor Your Containers with Sysdig [COMMUNITY-TOOL]
- thenewstack.io: Applying Basic vs. Advanced Monitoring Techniques [COMMUNITY-TOOL]
- cloudforecast.io: cAdvisor and Kubernetes Monitoring Guide 🌟 [COMMUNITY-TOOL]
- hmh.engineering: Musings on microservice observability! [COMMUNITY-TOOL]
- stackoverflow.blog: Observability is key to the future of software (and' your DevOps career) [COMMUNITY-TOOL]
- dynatrace.com: What is observability? Not just logs, metrics and traces [COMMUNITY-TOOL]
- thenewstack.io: Observability Is the New Kubernetes 🌟 [COMMUNITY-TOOL]
- learnsteps.com: Logging Infrastructure System Design [COMMUNITY-TOOL]
- medium.com: Monitoring Microservices - Part 1: Observability | Anderson' Carvalho [COMMUNITY-TOOL]
- intellipaat.com: Top 10 DevOps Monitoring Tools [COMMUNITY-TOOL]
- cncf.io: How to add observability to your application pipeline [COMMUNITY-TOOL]
- storiesfromtheherd.com: Unpacking Observability [COMMUNITY-TOOL]
- logz.io: A Monitoring Reality Check: More of the Same Won’t Work [COMMUNITY-TOOL]
- medium.com/buildpiper: Observability for Monitoring Microservices — Top' 5 Ways! [COMMUNITY-TOOL]
- medium.com/@cbkwgl: Continuous Monitoring in DevOps 🌟 [COMMUNITY-TOOL]
- logz.io: The Open Source Observability Adoption and Migration Curve [COMMUNITY-TOOL]
- devopscube.com: What Is Observability? Comprehensive Beginners Guide [COMMUNITY-TOOL]
- tiagodiasgeneroso.medium.com: Observability Concepts you should know [COMMUNITY-TOOL]
- faun.pub: Getting started with Observability [COMMUNITY-TOOL]
- medium.com/@badawekoo: Monitoring in DevOps lifecycle [COMMUNITY-TOOL]
- laduram.medium.com: The Future of Observability [COMMUNITY-TOOL]
- devops.com: Where Does Observability Stand Today, and Where is it Going' Next? [COMMUNITY-TOOL]
- medium.com/kubeshop-i: Top 8 Open-Source Observability & Testing Tools [COMMUNITY-TOOL]
- dzone: 11 Observability Tools You Should Know 🌟 [COMMUNITY-TOOL]
- medium.com/devops-techable: Setup monitoring with Prometheus and Grafana' in Kubernetes — Start monitoring your Kubernetes cluster resources [COMMUNITY-TOOL]
- thenewstack.io: What Is Container Monitoring? [COMMUNITY-TOOL]
- devops.com: Why Monitoring-as-Code Will be a Must for DevOps Teams [COMMUNITY-TOOL]
- medium.com/cloud-native-daily: Why You Shouldn’t Fear to Adopt OpenTelemetry' for Observability [COMMUNITY-TOOL]
- medium.com/@bijit211987: Observability Driven Development (ODD)-Enhancing' System Reliability [COMMUNITY-TOOL]
- medium.com/performance-engineering-for-the-ordinary-barbie: Why profiling' should be part of regular software development workflow 🌟 [COMMUNITY-TOOL]
- github.com/prometheus-operator [COMMUNITY-TOOL]
- redhat.com: How to gather and display metrics in Red Hat OpenShift (Prometheus' + Grafana) [COMMUNITY-TOOL]
- Generally Available today: Red Hat OpenShift Container Platform 3.11 is' ready to power enterprise Kubernetes deployments 🌟 [COMMUNITY-TOOL]
- OCP 3.11 Metrics and Logging [COMMUNITY-TOOL]
- Prometheus Cluster Monitoring 🌟 [COMMUNITY-TOOL]
- Leveraging Kubernetes and OpenShift for automated performance tests (part 1) [COMMUNITY-TOOL]
- Building an observability stack for automated performance tests on Kubernetes and OpenShift (part 2) 🌟 [COMMUNITY-TOOL]
- developers.redhat.com: Monitoring .NET Core applications on Kubernetes [COMMUNITY-TOOL]
- Systems Monitoring with Prometheus and Grafana [COMMUNITY-TOOL]
- cncf.io: Monitoring micro-front ends on Kubernetes with NGINX 🌟 [COMMUNITY-TOOL]
- dzone: Getting Started With Kibana Advanced Searches [COMMUNITY-TOOL]
- dzone: Kibana Hacks: 5 Tips and Tricks [COMMUNITY-TOOL]
- juanonlab.com: Dashboards de Kibana [COMMUNITY-TOOL]
- dev.to: Beginner's guide to understanding the relevance of your search' with Elasticsearch and Kibana [COMMUNITY-TOOL]
- devops.com: How Centralized Log Management Can Save Your Company [COMMUNITY-TOOL]
- betterprogramming.pub: The Art of Logging [COMMUNITY-TOOL]
- zdnet.com: AWS, as predicted, is forking Elasticsearch [COMMUNITY-TOOL]
- amazon.com: Stepping up for a truly open source Elasticsearch [COMMUNITY-TOOL]
- Store NGINX access logs in Elasticsearch with Logging operator 🌟 [COMMUNITY-TOOL]
- Rancher Logging Operator 🌟 [COMMUNITY-TOOL]
- blog.streammonkey.com: How We Serverlessly Migrated 1.58 Billion Elasticsearch' Documents [COMMUNITY-TOOL]
- youtube: ELK for beginners - by XavkiEn 🌟 [COMMUNITY-TOOL]
- javatechonline.com: How To Monitor Spring Boot Microservices Using ELK Stack? [COMMUNITY-TOOL]
- dzone: Running Elasticsearch on Kubernetes [COMMUNITY-TOOL]
- medium: Which Elasticsearch Provider is Right For You? 🌟 [COMMUNITY-TOOL]
- jertel/elastalert2 ⭐ 1119 [COMMUNITY-TOOL]
- medium.com/hepsiburadatech: Hepsiburada Search Engine on Kubernetes [COMMUNITY-TOOL]
- dev.to/sagary2j: ELK Stack Deployment using MiniKube single node architecture [COMMUNITY-TOOL]
- search-guard.com/sgctl-elasticsearch: SGCTL - TAKE BACK CONTROL [COMMUNITY-TOOL]
- medium: A detailed guide to deploying Elasticsearch on Elastic Cloud on' Kubernetes (ECK) [COMMUNITY-TOOL]
- opensearch.org 🌟 [COMMUNITY-TOOL]
- amazon.com: Introducing OpenSearch [COMMUNITY-TOOL]
- logz.io: Logz.io Announces Support for OpenSearch; A Community-driven Open' Source Fork of Elasticsearch and Kibana [COMMUNITY-TOOL]
- techrepublic.com: OpenSearch: AWS rolls out its open source Elasticsearch' fork [COMMUNITY-TOOL]
- thenewstack.io: This Week in Programming: AWS Completes Elasticsearch Fork' with OpenSearch [COMMUNITY-TOOL]
- logz.io: OpenSearch Is Now Generally Available! [COMMUNITY-TOOL]
- thenewstack.io: This Week in Programming: The ElasticSearch Saga Continues [COMMUNITY-TOOL]
- aws.amazon.com: Keeping clients of OpenSearch and Elasticsearch compatible' with open source [COMMUNITY-TOOL]
- medium: Logging with EFK - Pratyush Mathur [COMMUNITY-TOOL]
- medium.com/@CuriousLearner: Deploying EFK stack on Kubernetes [COMMUNITY-TOOL]
- medium.com/@tech_18484: Simplifying Kubernetes Logging with EFK Stack [COMMUNITY-TOOL]
- logz.io: A Beginner’s Guide to Logstash Grok [COMMUNITY-TOOL]
- logz.io: Grok Pattern Examples for Log Parsing [COMMUNITY-TOOL]
- devops.com: The Fallacy of Continuous Integration, Delivery and Testing [COMMUNITY-TOOL]
- dzone.com: The Keys to Performance Tuning and Testing [COMMUNITY-TOOL]
- dzone.com: Performance Patterns in Microservices-Based Integrations [COMMUNITY-TOOL]
- How to read a Thread Dump [COMMUNITY-TOOL]
- dzone: 8 Options for Capturing Thread Dumps [COMMUNITY-TOOL]
- blog.arkey.fr: Using JDK FlightRecorder and JDK Mission Control [COMMUNITY-TOOL]
- developers.redhat.com: Troubleshooting java applications on openshift [COMMUNITY-TOOL]
- VisualVM: JVisualVM to an Openshift pod [COMMUNITY-TOOL]
- redhat.com: How do I analyze a Java heap dump? [COMMUNITY-TOOL]
- Microservice Observability with Distributed Tracing: OpenTelemetry.io' 🌟 [COMMUNITY-TOOL]
- zipkin.io [COMMUNITY-TOOL]
- OpenTracing.io [COMMUNITY-TOOL]
- awkwardferny.medium.com: Setting up Distributed Tracing in Kubernetes with' OpenTracing, Jaeger, and Ingress-NGINX [COMMUNITY-TOOL]
- ploffay.medium.com: Five years evolution of open-source distributed tracing' 🌟 [COMMUNITY-TOOL]
- signadot.com: Sandboxes in Kubernetes using OpenTelemetry [COMMUNITY-TOOL]
- Medium: Distributed Tracing and Monitoring using OpenCensus [COMMUNITY-TOOL]
- Dzone: Zipkin vs. Jaeger: Getting Started With Tracing [COMMUNITY-TOOL]
- opensource.com: Distributed tracing in a microservices world [COMMUNITY-TOOL]
- opensource.com: 3 open source distributed tracing tools [COMMUNITY-TOOL]
- thenewstack.io: Tracing: Why Logs Aren’t Enough to Debug Your Microservices' 🌟 [COMMUNITY-TOOL]
- github.com/open-telemetry/opentelemetry-operator ⭐ 1696 [COMMUNITY-TOOL]
- medium.com/@magstherdev: OpenTelemetry Operator [COMMUNITY-TOOL]
- thenewstack.io: OpenTelemetry Gaining Traction from Companies and Vendors [COMMUNITY-TOOL]
- thenewstack.io: How OpenTelemetry Works with Kubernetes [COMMUNITY-TOOL]
- medium.com/@bijit211987: Grafana with OpenTelemetry, Vendor-neutral and' open-source approach [COMMUNITY-TOOL]
- medium: Jaeger VS OpenTracing VS OpenTelemetry [COMMUNITY-TOOL]
- medium: Using Jaeger and OpenTelemetry SDKs in a mixed environment with' W3C Trace-Context [COMMUNITY-TOOL]
- thenewstack.io: Jaeger vs. Zipkin: Battle of the Open Source Tracing Tools [COMMUNITY-TOOL]
- Grafana Tempo ⭐ 5276 [ENTERPRISE-STABLE]
- opensource.com: Get started with distributed tracing using Grafana Tempo [COMMUNITY-TOOL]
- Azure App Service Auto-Heal: Capturing Relevant Data During Performance' Issues [COMMUNITY-TOOL]
- APM in wikipedia [COMMUNITY-TOOL]
- github.com/antonarhipov/awesome-apm: Awesome APM [COMMUNITY-TOOL]
- dzone.com: APM Tools Comparison [COMMUNITY-TOOL]
- dzone.com: Java Performance Monitoring: 5 Open Source Tools You Should Know [COMMUNITY-TOOL]
- datadoghq.com [COMMUNITY-TOOL]
- Mininimum elasticsearch requirement is 6.2.x or higher [COMMUNITY-TOOL]
- Elastic APM Server Docker image [COMMUNITY-TOOL]
- Monitoring Java applications with Elastic: Getting started with the Elastic' APM Java Agent [COMMUNITY-TOOL]
- Jenkins pipeline shared library for the project Elastic APM 🌟 ⭐ 11 [COMMUNITY-TOOL]
- bqstack.com: Monitoring Application using Elastic APM [COMMUNITY-TOOL]
- Successful Kubernetes Monitoring – Three Pitfalls to Avoid [COMMUNITY-TOOL]
- Tutorial: Guide to automated SRE-driven performance engineering 🌟 [COMMUNITY-TOOL]
- dynatrace.com: 4 steps to modernize your IT service operations with Dynatrace [COMMUNITY-TOOL]
- dynatrace.com: Analyze all AWS data in minutes with Amazon CloudWatch Metric' Streams available in Dynatrace [COMMUNITY-TOOL]
- dynatrace.com: New Dynatrace Operator elevates cloud-native observability' for Kubernetes [COMMUNITY-TOOL]
- dynatrace.com: How to collect Prometheus metrics in Dynatrace [COMMUNITY-TOOL]
- dynatrace.com: Automatic connection of logs and traces accelerates AI-driven' cloud analytics [COMMUNITY-TOOL]
- devops.com: Dynatrace Advances Application Environments as Code [COMMUNITY-TOOL]
- thenewstack.io: Serverless Needs More Observability Tools [COMMUNITY-TOOL]
- Apache Beam [COMMUNITY-TOOL]
- Krossboard [COMMUNITY-TOOL]
- Krossboard: A centralized usage analytics approach for multiple Kubernetes [COMMUNITY-TOOL]
- cloudbees.com: Automated Build and Deploy Feedback Using Jenkins and Instana' 🌟 [COMMUNITY-TOOL]
- Netdata ⭐ 78906 [DE FACTO STANDARD]
- PM2 ⭐ 43173 [DE FACTO STANDARD]
- Huginn ⭐ 49312 [DE FACTO STANDARD]
- OS Query ⭐ 23262 [DE FACTO STANDARD]
- Glances ⭐ 32615 [DE FACTO STANDARD]
- TDengine ⭐ 24860 [DE FACTO STANDARD]
- stackpulse.com: Automated Kubernetes Pod Restarting Analysis with StackPulse [COMMUNITY-TOOL]
- Checkly [COMMUNITY-TOOL]
- hashicorp.com: Monitoring as Code with Terraform Cloud and Checkly [COMMUNITY-TOOL]
- network-king.net: IoT use in healthcare grows but has some pitfalls [COMMUNITY-TOOL]
- Zebrium [COMMUNITY-TOOL]
- louislam/uptime-kuma ⭐ 87087 [DE FACTO STANDARD]
- OpenTelemetry (OTel) vs Application Performance Monitoring (APM) [COMMUNITY-TOOL]
- OOMKilled in Kubernetes: Understanding and Preventing Hidden Memory Leaks [COMMUNITY-TOOL]
- Grafana [COMMUNITY-TOOL]
- SigNoz: Open source Application Performance Monitoring (APM) & Observability' tool 🌟 ⭐ 26999 [DE FACTO STANDARD]
- Prometheus JMX Exporter 🌟 ⭐ 3305 [ENTERPRISE-STABLE]
- OpenTelemetry Collector ⭐ 7050 [ENTERPRISE-STABLE]
Cloud Infrastructure
Service Mesh
Istio Mesh
- Istio.io [EN CONTENT] [ADVANCED LEVEL] [DE FACTO STANDARD] — The premier open-source service mesh providing advanced traffic management, end-to-end security, and granular observability. Uses Envoy proxies (via sidecars or Ambient mode) to secure and manage microservice fabrics.
Cloud Native Infrastructure
Observability
Distributed Tracing
Jaeger Platform
- jaegertracing.io [DOCUMENTATION] [DE FACTO STANDARD] [ENTERPRISE-STABLE] — The official gateway for Jaeger, a CNCF-graduated distributed tracing platform. Essential for microservice architectures to monitor transactions, perform root-cause analysis, optimize performance bottlenecks, and visualize complex request propagation paths.
Log Analysis
Visualization Tools
- Kibana [DOCUMENTATION] [DE FACTO STANDARD] [ENTERPRISE-STABLE] — The foundational visualization and management interface for the Elastic Stack. Enables operators to search, index, analyze, and construct real-time security dashboards and log analysis patterns for high-throughput microservice applications.
Cloud Native Languages
Java
Performance Tuning
- tier1app.com [EN CONTENT] [ENTERPRISE-STABLE] — A dedicated APM tool for analyzing Java thread dumps and performance. Provides automated diagnostics for thread contention and deadlocks to optimize JVM application responsiveness.
- fastthread.io [EN CONTENT] [DE FACTO STANDARD] [ENTERPRISE-STABLE] — Industrial-grade online Java thread dump analyzer that uses AI diagnostics to identify CPU spikes, thread leaks, and deadlock patterns. Essential for post-mortem analysis of containerized JVM workloads.
- gceasy.io [EN CONTENT] [ADVANCED LEVEL] [DE FACTO STANDARD] [ENTERPRISE-STABLE] — Machine-learning powered JVM Garbage Collection log analyzer. Automates the detection of memory leaks, GC pauses, and heap sizing misconfigurations, offering actionable recommendations for optimization.
- heaphero.io [EN CONTENT] [ADVANCED LEVEL] [ENTERPRISE-STABLE] — An automated cloud-based JVM heap dump analyzer built to parse large memory dumps quickly. Detects memory leaks and optimizes data structure footprints to resolve OutOfMemoryError crashes.
Event-Driven Architecture
Apache Kafka
Tooling and UI
- Kafdrop – Kafka Web UI 🌟 ⭐ 6135 [DE FACTO STANDARD] [ENTERPRISE-STABLE] — Curator Insight: Highly popular, lightweight web UI for monitoring and managing Apache Kafka. Live Grounding: Renders cluster info, brokers, topics, partition offsets, consumer group lag, and allows active JSON/protobuf message payload inspection.
Infrastructure Operations
Sysadmin Toolsets
Resource Curation
Awesome Lists
- Awesome Sysadmin ⭐ 33981 [DE FACTO STANDARD] — An incredibly rich curation containing production-grade open source utilities, control planes, networking layers, and security mechanisms used daily by systems architects and site reliability engineers.
Observability (1)
Telemetry Standards
OpenTelemetry vs Prometheus
- Prometheus and OpenTelemetry Compatibility Issues [ADVANCED LEVEL] [COMMUNITY-TOOL] — An informative look at the historical data model incompatibilities between Prometheus and OpenTelemetry (OTel). It details the industry efforts to reconcile standard Prometheus structures with the broader OTel landscape.
Observability and Performance
Kubernetes Internals
Resource Management
- The Hidden CPU Throttling Crisis in Kubernetes Clusters [EN CONTENT] [ADVANCED LEVEL] [COMMUNITY-TOOL] — An in-depth analysis exposing the silent threat of CPU throttling inside Kubernetes clusters caused by rigid CFS quota management. Demonstrates how microservices suffer latency spikes even with low aggregate CPU consumption.
Performance Testing
HTTP Benchmarking
- blog.cloud-mercato.com: New HTTP benchmark tool pycurlb [EN CONTENT] [ADVANCED LEVEL] [COMMUNITY-TOOL] — A deep-dive introducing
pycurlb, a fast performance tool wrapping libcurl for rapid HTTP request benchmarking in Python. Explores real-world performance results and technical comparisons.
Operations and Reliability
Observability and Monitoring
Foundations
- Monitoring Distributed Systems - Google SRE Book [ADVANCED LEVEL] [DOCUMENTATION] [DE FACTO STANDARD] — The industry-standard chapter from Google's SRE book detailing the implementation of distributed systems monitoring. It defines the 'Four Golden Signals'—latency, traffic, errors, and saturation—providing practical blueprints to prevent alert fatigue and build actionable dashboard designs.
💡 Explore Related: Mkdocs | Cheatsheets | Git