# Monitoring and Performance. Prometheus, Grafana, APMs and more !!! info "Architectural Context" Detailed reference for Monitoring and Performance. Prometheus, Grafana, APMs and more in the context of Architectural Foundations. ## Standard Reference - [Monitoring jenkins using instana](https://www.ibm.com/think) [COMMUNITY-TOOL] - [victoriametrics.com: Q2 2024 Round Up: VictoriaMetrics & VictoriaLogs Updates](https://victoriametrics.com/blog/q2-2024-round-up-victoriametrics-and-victorialogs-updates/index.html) [COMMUNITY-TOOL] - [thenewstack.io: The Challenges of Monitoring Kubernetes and OpenShift](https://thenewstack.io/kubernetes) [COMMUNITY-TOOL] - [sysdig.com: Kubernetes Monitoring with Prometheus, the ultimate guide ๐ŸŒŸ](https://www.sysdig.com/blog/kubernetes-monitoring-prometheus) [COMMUNITY-TOOL] - [sysdig.com: How to monitor kube-proxy ๐ŸŒŸ](https://www.sysdig.com/blog/monitor-kube-proxy) [COMMUNITY-TOOL] - [getenroute.io: TSDB, Prometheus, Grafana In Kubernetes: Tracing A Variable Across The OSS Monitoring Stack](https://www.saaras.io/blog/leverage-open-source-oss-derive-insights-grafana-prometheus) [COMMUNITY-TOOL] - [opsdis.com: Building a custom monitoring solution with Grafana, Prometheus and Loki](https://binero.com/observability) [COMMUNITY-TOOL] - [harness.io: Metrics to Improve Continuous Integration Performance](https://www.harness.io/blog/continuous-integration-performance-metrics) [COMMUNITY-TOOL] - [skilledfield.com.au: Monitoring Kubernetes and Docker Container Logs](https://skillfield.com.au/blog/monitoring-kubernetes-and-docker-container-logs) [COMMUNITY-TOOL] - [forbes.com: Who Should Own The Job Of Observability In DevOps?](https://www.forbes.com/councils/forbestechcouncil/2021/09/03/who-should-own-the-job-of-observability-in-devops/?streamIndex=0) [COMMUNITY-TOOL] - [infoworld.com: The RED method: A new strategy for monitoring microservices](https://www.infoworld.com/article/2270578/the-red-method-a-new-strategy-for-monitoring-microservices.html) [COMMUNITY-TOOL] - [forbes.com: From Data Collection To Delivering KPIs: A Roadmap To A Mature Observability Strategy](https://www.forbes.com/councils/forbestechcouncil/2024/03/08/from-data-collection-to-delivering-kpis-a-roadmap-to-a-mature-observability-strategy/?streamIndex=0) [COMMUNITY-TOOL] โ€” - The observability space is a multi-billion-dollar market and for good reasonโ€”there are a lot of benefits when you implement a robust observability strategy. But extracting value is not as simple as adopting a tool, throwing your data into a black box and expecting it to spit out business-relevant, contextualized insights and helpful visualizations. - As they say, โ€œNothing good comes easyโ€โ€”but when done right, a mature observability strategy will pay for itself over and over again. - [KPIs](https://www.kpi.org/KPI-Basics) [COMMUNITY-TOOL] - [Promster: Use Prometheus in huge deployments with dynamic clustering and scrape sharding capabilities based on ETCD service registration](http://github.com/flaviostutz/promster) โญ 31 [COMMUNITY-TOOL] - [timescale.com: Prometheus vs. OpenTelemetry Metrics: A Complete Guide](https://www.tigerdata.com/blog/prometheus-vs-opentelemetry-metrics-a-complete-guide) [COMMUNITY-TOOL] - [acloudguru.com: Getting started with the Elastic Stack](https://www.pluralsight.com/resources/blog/cloud/getting-started-with-the-elastic-stack) [COMMUNITY-TOOL] - [aws.amazon.com: Amazon Elasticsearch Service Is Now Amazon OpenSearch Service and Supports OpenSearch 1.0](https://aws.amazon.com/blogs/aws/announcing-amazon-opensearch-service-which-supports-opensearch-10) [COMMUNITY-TOOL] - [Transitive blocks](https://fastthread.io/ft-error.jsp&s=t) [COMMUNITY-TOOL] โ€” - [Unresponsive JVM](https://fastthread.io/ft-error.jsp&s=t) - [Sudden CPU spike](https://fastthread.io/ft-error.jsp&s=t) - [Thread Leaks](https://fastthread.io/ft-error.jsp&s=t) - [Remote Debugging of Java Applications on OpenShift](https://www.redhat.com/en/blog/remote-debugging-java-applications-openshift) [COMMUNITY-TOOL] - [Dapper](https://research.google/pubs/dapper-a-large-scale-distributed-systems-tracing-infrastructure) [COMMUNITY-TOOL] - [newrelic.com: OpenTracing, OpenCensus, OpenTelemetry, and New Relic (Best overview of OpenTelemetry)](https://newrelic.com/blog/dem/opentelemetry-opentracing-opencensus) [COMMUNITY-TOOL] - [sentry.io](https://sentry.io/welcome) [COMMUNITY-TOOL] โ€” - APMs like Dynatrace, etc. - [Elastic APM](https://www.elastic.co/observability/application-performance-monitoring) [COMMUNITY-TOOL] - [Elastic APM Server](https://www.elastic.co/docs/solutions/observability/apm) [COMMUNITY-TOOL] - [adictosaltrabajo.com: Monitorizaciรณn y anรกlisis de rendimiento de aplicaciones con Dynatrace APM](https://adictosaltrabajo.com/2016/10/26/monitorizacion-y-analisis-de-rendimiento-de-aplicaciones-con-dynatrace) [COMMUNITY-TOOL] - [dynatrace.com: openshift monitoring](https://www.dynatrace.com/hub/detail/red-hat-openshift) [COMMUNITY-TOOL] - [dynatrace.com: Deploy OneAgent on OpenShift Container Platform](https://docs.dynatrace.com/docs/ingest-from/setup-on-container-platforms/kubernetes/legacy/deploy-oneagent-operator-openshift-legacy) [COMMUNITY-TOOL] - [My Dynatrace proof of concept ๐ŸŒŸ](https://github.com/nubenetes/awesome-kubernetes/blob/master/pdf/dynatrace_demo.pdf) โญ 661 [COMMUNITY-TOOL] - [dynatrace.com: Monitoring of Kubernetes Infrastructure for day 2 operations](https://www.dynatrace.com/news/blog/what-is-kubernetes-2) [COMMUNITY-TOOL] - [dynatrace.com: The Power of OpenShift, The Visibility of Dynatrace](https://www.dynatrace.com/news/blog/what-is-openshift-2) [COMMUNITY-TOOL] - [A Distributed Tracing Adventure in Apache Beam](http://rion.io/2020/07/04/a-distributed-tracing-adventure-in-apache-beam) [COMMUNITY-TOOL] - [rancher.com: Monitor Etcd with Prometheus and Grafana using Rancher](https://www.suse.com/c/rancher_blog/monitor-etcd-with-prometheus-and-grafana-using-rancher) [COMMUNITY-TOOL] - [openshift.com: Monitoring Infrastructure Openshift 4.x Using Zabbix Operator](https://www.redhat.com/en/blog/monitoring-infrastructure-openshift-4.x-using-zabbix-operator) [COMMUNITY-TOOL] - [openshift.com: How to Monitor Openshift 4.x with Zabbix using Prometheus - Part 2](https://www.redhat.com/en/blog/how-to-monitoring-openshift-4.x-with-zabbix-using-prometheus-part-2) [COMMUNITY-TOOL] - [replex.io](https://www.splunk.com/en_us/appdynamics-joins-splunk.html?301=appdynamics) [COMMUNITY-TOOL] - [**Micrometer** Collector](http://micrometer.io) [COMMUNITY-TOOL] - [en.wikipedia.org/wiki/List_of_performance_analysis_tools](https://en.wikipedia.org/wiki/List_of_performance_analysis_tools) [COMMUNITY-TOOL] - [InspectIT](https://en.wikipedia.org/wiki/InspectIT) [COMMUNITY-TOOL] - [VisualVM](https://en.wikipedia.org/wiki/VisualVM) [COMMUNITY-TOOL] - [OverOps](https://en.wikipedia.org/wiki/OverOps) [COMMUNITY-TOOL] - [Dzone: 14 Best Performance Testing Tools and APM Solutions](https://dzone.com/articles/14-best-performance-testing-tools-and-apm-solution) [COMMUNITY-TOOL] - [victoriametrics.com](https://victoriametrics.com) [COMMUNITY-TOOL] - [Monitor your Azure cloud estate - Cloud Adoption Framework](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/manage/monitor#reference-for-monitoring-azure-services) [COMMUNITY-TOOL] - [Wikipedia: Application Performance Index](https://en.wikipedia.org/wiki/Apdex) [COMMUNITY-TOOL] - [Observability vs Monitoring](https://middleware.io/blog/observability-vs-monitoring) [COMMUNITY-TOOL] - [dzone.com: Kubernetes Monitoring: Best Practices, Methods, and Existing' Solutions](https://dzone.com/articles/kubernetes-monitoring-best-practices-methods-and-e) [COMMUNITY-TOOL] - [CNCF End User Technology Radar: Observability, September 2020 ๐ŸŒŸ](https://www.cncf.io/blog/2020/09/11/cncf-end-user-technology-radar-observability-september-2020) [COMMUNITY-TOOL] - [magalix.com: Monitoring Kubernetes Clusters Through Prometheus & Grafana' ๐ŸŒŸ](https://www.magalix.com/blog/monitoring-of-kubernetes-cluster-through-prometheus-and-grafana) [COMMUNITY-TOOL] - [learnsteps.com: Monitoring Infrastructure System Design](https://www.learnsteps.com/monitoring-infrastructure-system-design) [COMMUNITY-TOOL] - [bravenewgeek.com: The Observability Pipeline](https://bravenewgeek.com/the-observability-pipeline) [COMMUNITY-TOOL] - [thenewstack.io: 3 Key Configuration Challenges for Kubernetes Monitoring' with Prometheus](https://thenewstack.io/3-key-configuration-challenges-for-kubernetes-monitoring-with-prometheus) [COMMUNITY-TOOL] - [thenewstack.io: Monitoring vs. Observability: Whatโ€™s the Difference?](https://thenewstack.io/monitoring-vs-observability-whats-the-difference) [COMMUNITY-TOOL] - [dashbird.io: Monitoring vs Observability: Can you tell the difference? ๐ŸŒŸ](https://dashbird.io/blog/monitoring-vs-observability) [COMMUNITY-TOOL] - [thenewstack.io: Monitoring as Code: What It Is and Why You Need It ๐ŸŒŸ](https://thenewstack.io/monitoring-as-code-what-it-is-and-why-you-need-it) [COMMUNITY-TOOL] - [thenewstack.io: Observability Wonโ€™t Replace Monitoring (Because It Shouldnโ€™t)' ๐ŸŒŸ](https://thenewstack.io/observability-wont-replace-monitoring-because-it-shouldnt) [COMMUNITY-TOOL] - [matiasmct.medium.com: Observability at Scale](https://matiasmct.medium.com/observability-at-scale-52d0d9a5fb9b) [COMMUNITY-TOOL] - [logz.io: Top 11 Open Source Monitoring Tools for Kubernetes ๐ŸŒŸ](https://logz.io/blog/open-source-monitoring-tools-for-kubernetes) [COMMUNITY-TOOL] - [thenewstack.io: Kubernetes Observability Challenges in Cloud Native Architecture' ๐ŸŒŸ](https://thenewstack.io/kubernetes-observability-challenges-in-cloud-native-architecture) [COMMUNITY-TOOL] - [thenewstack.io: Best Practices to Optimize Infrastructure Monitoring within' DevOps Teams](https://thenewstack.io/best-practices-to-optimize-infrastructure-monitoring-within-devops-teams) [COMMUNITY-TOOL] - [faun.pub: DevOps Meets Observability ๐ŸŒŸ](https://faun.pub/devops-meets-observability-78775c021b0e) [COMMUNITY-TOOL] - [thenewstack.io: Growing Adoption of Observability Powers Business Transformation](https://thenewstack.io/growing-adoption-of-observability-powers-business-transformation) [COMMUNITY-TOOL] - [blog.thundra.io: What CI Observability Means for DevOps ๐ŸŒŸ](https://blog.thundra.io/what-ci-observability-means-for-devops) [COMMUNITY-TOOL] - [thenewstack.io: Monitoring API Latencies After Releases: 4 Mistakes to Avoid](https://thenewstack.io/monitoring-api-latencies-after-releases-4-mistakes-to-avoid) [COMMUNITY-TOOL] - [thenewstack.io: DevOps Observability from Code to Cloud](https://thenewstack.io/devops-observability-from-code-to-cloud) [COMMUNITY-TOOL] - [thenewstack.io: CI Observability for Effective Change Management ๐ŸŒŸ](https://thenewstack.io/ci-observability-for-effective-change-management) [COMMUNITY-TOOL] - [thenewstack.io: Monitor Your Containers with Sysdig](https://thenewstack.io/monitor-your-containers-with-sysdig) [COMMUNITY-TOOL] - [thenewstack.io: Applying Basic vs. Advanced Monitoring Techniques](https://thenewstack.io/applying-basic-vs-advanced-monitoring-techniques) [COMMUNITY-TOOL] - [cloudforecast.io: cAdvisor and Kubernetes Monitoring Guide ๐ŸŒŸ](https://cloudforecast.io/blog/cadvisor-and-kubernetes-monitoring-guide) [COMMUNITY-TOOL] - [hmh.engineering: Musings on microservice observability!](https://hmh.engineering/musings-on-microservice-observability-f7052ac42f04) [COMMUNITY-TOOL] - [stackoverflow.blog: Observability is key to the future of software (and' your DevOps career)](https://stackoverflow.blog/2021/09/08/observability-is-key-to-the-future-of-software-and-your-devops-career) [COMMUNITY-TOOL] - [dynatrace.com: What is observability? Not just logs, metrics and traces](https://www.dynatrace.com/news/blog/what-is-observability-2) [COMMUNITY-TOOL] - [thenewstack.io: Observability Is the New Kubernetes ๐ŸŒŸ](https://thenewstack.io/observability-is-the-new-kubernetes) [COMMUNITY-TOOL] - [learnsteps.com: Logging Infrastructure System Design](https://www.learnsteps.com/logging-infrastructure-system-design) [COMMUNITY-TOOL] - [medium.com: Monitoring Microservices - Part 1: Observability | Anderson' Carvalho](https://medium.com/geekculture/monitoring-microservices-part-1-observability-b2b44fa3e67e) [COMMUNITY-TOOL] - [intellipaat.com: Top 10 DevOps Monitoring Tools](https://intellipaat.com/blog/devops-monitoring-tools) [COMMUNITY-TOOL] - [cncf.io: How to add observability to your application pipeline](https://www.cncf.io/blog/2021/11/23/how-to-add-observability-to-your-application-pipeline) [COMMUNITY-TOOL] - [storiesfromtheherd.com: Unpacking Observability](https://storiesfromtheherd.com/unpacking-observability-a-beginners-guide-833258a0591f) [COMMUNITY-TOOL] - [logz.io: A Monitoring Reality Check: More of the Same Wonโ€™t Work](https://logz.io/blog/monitoring-reality-check) [COMMUNITY-TOOL] - [medium.com/buildpiper: Observability for Monitoring Microservices โ€” Top' 5 Ways!](https://medium.com/buildpiper/observability-for-monitoring-microservices-top-5-ways-587871e726d0) [COMMUNITY-TOOL] - [medium.com/@cbkwgl: Continuous Monitoring in DevOps ๐ŸŒŸ](https://medium.com/@cbkwgl/continuous-monitoring-in-devops-8d4db48a0e24) [COMMUNITY-TOOL] - [logz.io: The Open Source Observability Adoption and Migration Curve](https://logz.io/blog/open-source-observability-adoption-migration-curve) [COMMUNITY-TOOL] - [devopscube.com: What Is Observability? Comprehensive Beginners Guide](https://devopscube.com/what-is-observability) [COMMUNITY-TOOL] - [tiagodiasgeneroso.medium.com: Observability Concepts you should know](https://tiagodiasgeneroso.medium.com/observability-concepts-you-should-know-943fc057b208) [COMMUNITY-TOOL] - [faun.pub: Getting started with Observability](https://faun.pub/getting-started-with-observability-657d57aab1c7) [COMMUNITY-TOOL] - [medium.com/@badawekoo: Monitoring in DevOps lifecycle](https://medium.com/@badawekoo/monitoring-in-devops-lifecycle-4d9a2f277eb0) [COMMUNITY-TOOL] - [laduram.medium.com: The Future of Observability](https://laduram.medium.com/the-future-of-observability-c33cd7ff644a) [COMMUNITY-TOOL] - [devops.com: Where Does Observability Stand Today, and Where is it Going' Next?](https://devops.com/where-does-observability-stand-today-and-where-is-it-going-next) [COMMUNITY-TOOL] - [medium.com/kubeshop-i: Top 8 Open-Source Observability & Testing Tools](https://medium.com/kubeshop-i/top-8-open-source-observability-testing-tools-9341a361a634) [COMMUNITY-TOOL] - [dzone: 11 Observability Tools You Should Know ๐ŸŒŸ](https://dzone.com/articles/11-observability-tools-you-should-know-in-2023) [COMMUNITY-TOOL] - [medium.com/devops-techable: Setup monitoring with Prometheus and Grafana' in Kubernetes โ€” Start monitoring your Kubernetes cluster resources](https://medium.com/devops-techable/setup-monitoring-with-prometheus-and-grafana-in-kubernetes-start-monitoring-your-kubernetes-a3071f083fa6) [COMMUNITY-TOOL] - [thenewstack.io: What Is Container Monitoring?](https://thenewstack.io/what-is-container-monitoring) [COMMUNITY-TOOL] - [devops.com: Why Monitoring-as-Code Will be a Must for DevOps Teams](https://devops.com/why-monitoring-as-code-will-be-a-must-for-devops-teams) [COMMUNITY-TOOL] - [medium.com/cloud-native-daily: Why You Shouldnโ€™t Fear to Adopt OpenTelemetry' for Observability](https://medium.com/cloud-native-daily/why-you-shouldnt-fear-to-adopt-opentelemetry-for-observability-fcb6371ea8fe) [COMMUNITY-TOOL] - [medium.com/@bijit211987: Observability Driven Development (ODD)-Enhancing' System Reliability](https://medium.com/@bijit211987/observability-driven-development-2bc2cdde8661) [COMMUNITY-TOOL] - [medium.com/performance-engineering-for-the-ordinary-barbie: Why profiling' should be part of regular software development workflow ๐ŸŒŸ](https://medium.com/performance-engineering-for-the-ordinary-barbie/why-profiling-should-be-part-of-regular-software-development-workflow-8b19b7f52b38) [COMMUNITY-TOOL] - [github.com/prometheus-operator](https://github.com/prometheus-operator) [COMMUNITY-TOOL] - [redhat.com: **How to gather and display metrics in Red Hat OpenShift** (Prometheus' + Grafana)](https://www.redhat.com/en/blog/how-gather-and-display-metrics-red-hat-openshift) [COMMUNITY-TOOL] - [Generally Available today: Red Hat OpenShift Container Platform 3.11 is' ready to power enterprise Kubernetes deployments ๐ŸŒŸ](https://www.redhat.com/en/blog/generally-available-today-red-hat-openshift-container-platform-311-ready-power-enterprise-kubernetes-deployments) [COMMUNITY-TOOL] - [OCP 3.11 Metrics and Logging](https://docs.openshift.com/container-platform/3.11/release_notes/ocp_3_11_release_notes.html#ocp-311-metrics-and-logging) [COMMUNITY-TOOL] - [Prometheus Cluster Monitoring ๐ŸŒŸ](https://docs.openshift.com/container-platform/3.11/install_config/prometheus_cluster_monitoring.html) [COMMUNITY-TOOL] - [Leveraging Kubernetes and OpenShift for automated performance tests (part 1)](https://developers.redhat.com/blog/2018/11/22/automated-performance-testing-kubernetes-openshift) [COMMUNITY-TOOL] - [Building an observability stack for automated performance tests on Kubernetes and OpenShift (part 2) ๐ŸŒŸ](https://developers.redhat.com/blog/2019/01/03/leveraging-openshift-or-kubernetes-for-automated-performance-tests-part-2) [COMMUNITY-TOOL] - [developers.redhat.com: Monitoring .NET Core applications on Kubernetes](https://developers.redhat.com/blog/2020/08/05/monitoring-net-core-applications-on-kubernetes) [COMMUNITY-TOOL] - [Systems Monitoring with Prometheus and Grafana](https://flightaware.engineering/systems-monitoring-with-prometheus-grafana) [COMMUNITY-TOOL] - [cncf.io: Monitoring micro-front ends on Kubernetes with NGINX ๐ŸŒŸ](https://www.cncf.io/blog/2023/02/01/monitoring-micro-front-ends-on-kubernetes-with-nginx) [COMMUNITY-TOOL] - [dzone: Getting Started With Kibana Advanced Searches](https://dzone.com/articles/getting-started-with-kibana-advanced-searches) [COMMUNITY-TOOL] - [dzone: Kibana Hacks: 5 Tips and Tricks](https://dzone.com/articles/kibana-hacks-5-tips-and-tricks) [COMMUNITY-TOOL] - [juanonlab.com: Dashboards de Kibana](https://www.juanonlab.com/blog/es/dashboards-de-kibana) [COMMUNITY-TOOL] - [dev.to: Beginner's guide to understanding the relevance of your search' with Elasticsearch and Kibana](https://dev.to/lisahjung/beginner-s-guide-to-understanding-the-relevance-of-your-search-with-elasticsearch-and-kibana-29n6) [COMMUNITY-TOOL] - [devops.com: How Centralized Log Management Can Save Your Company](https://devops.com/how-centralized-log-management-can-save-your-company) [COMMUNITY-TOOL] - [betterprogramming.pub: The Art of Logging](https://betterprogramming.pub/creating-a-human-and-machine-freindly-logging-format-bb6d4bb01dca) [COMMUNITY-TOOL] - [zdnet.com: AWS, as predicted, is forking Elasticsearch](https://www.zdnet.com/article/aws-as-predicted-is-forking-elasticsearch) [COMMUNITY-TOOL] - [amazon.com: Stepping up for a truly open source Elasticsearch](https://aws.amazon.com/blogs/opensource/stepping-up-for-a-truly-open-source-elasticsearch) [COMMUNITY-TOOL] - [Store NGINX access logs in Elasticsearch with Logging operator ๐ŸŒŸ](https://banzaicloud.com/docs/one-eye/logging-operator/quickstarts/es-nginx) [COMMUNITY-TOOL] - [Rancher Logging Operator ๐ŸŒŸ](https://rancher.com/docs/rancher/v2.x/en/logging/v2.5) [COMMUNITY-TOOL] - [blog.streammonkey.com: How We Serverlessly Migrated 1.58 Billion Elasticsearch' Documents](https://blog.streammonkey.com/how-we-serverlessly-migrated-1-58-billion-elasticsearch-documents-33ad3d0d7c4f) [COMMUNITY-TOOL] - [youtube: ELK for beginners - by XavkiEn ๐ŸŒŸ](https://www.youtube.com/playlist?list=PLWZKNB9waqIX00uj5q4nX_TOFiX3if1z3) [COMMUNITY-TOOL] - [javatechonline.com: How To Monitor Spring Boot Microservices Using ELK Stack?](https://javatechonline.com/how-to-monitor-spring-boot-microservices-using-elk-stack) [COMMUNITY-TOOL] - [dzone: Running Elasticsearch on Kubernetes](https://dzone.com/articles/running-elasticsearch-on-kubernetes) [COMMUNITY-TOOL] - [medium: Which Elasticsearch Provider is Right For You? ๐ŸŒŸ](https://medium.com/gigasearch/which-elasticsearch-provider-is-right-for-you-3d596a65e704) [COMMUNITY-TOOL] - [jertel/elastalert2](https://github.com/jertel/elastalert2) โญ 1119 [COMMUNITY-TOOL] - [medium.com/hepsiburadatech: Hepsiburada Search Engine on Kubernetes](https://medium.com/hepsiburadatech/hepsiburada-search-engine-on-kubernetes-1fe03a3e71a3) [COMMUNITY-TOOL] - [dev.to/sagary2j: ELK Stack Deployment using MiniKube single node architecture](https://dev.to/sagary2j/elk-stack-deployment-using-minikube-single-node-architecture-16cl) [COMMUNITY-TOOL] - [search-guard.com/sgctl-elasticsearch: SGCTL - TAKE BACK CONTROL](https://search-guard.com/sgctl-elasticsearch) [COMMUNITY-TOOL] - [medium: A detailed guide to deploying Elasticsearch on Elastic Cloud on' Kubernetes (ECK)](https://medium.com/99dotco/a-detail-guide-to-deploying-elasticsearch-on-elastic-cloud-on-kubernetes-eck-31808ac60466) [COMMUNITY-TOOL] - [opensearch.org ๐ŸŒŸ](https://opensearch.org) [COMMUNITY-TOOL] - [amazon.com: Introducing OpenSearch](https://aws.amazon.com/blogs/opensource/introducing-opensearch) [COMMUNITY-TOOL] - [logz.io: Logz.io Announces Support for OpenSearch; A Community-driven Open' Source Fork of Elasticsearch and Kibana](https://logz.io/news-posts/logz-io-announces-support-for-opensearch-a-community-driven-open-source-fork-of-elasticsearch-and-kibana) [COMMUNITY-TOOL] - [techrepublic.com: OpenSearch: AWS rolls out its open source Elasticsearch' fork](https://www.techrepublic.com/article/opensearch-aws-rolls-out-its-open-source-elasticsearch-fork) [COMMUNITY-TOOL] - [thenewstack.io: This Week in Programming: AWS Completes Elasticsearch Fork' with OpenSearch](https://thenewstack.io/this-week-in-programming-aws-completes-elasticsearch-fork-with-opensearch) [COMMUNITY-TOOL] - [logz.io: OpenSearch Is Now Generally Available!](https://logz.io/blog/opensearch-1-0-ga-generally-available-elasticsearch-kibana-fork) [COMMUNITY-TOOL] - [thenewstack.io: This Week in Programming: The ElasticSearch Saga Continues](https://thenewstack.io/this-week-in-programming-the-elasticsearch-saga-continues) [COMMUNITY-TOOL] - [aws.amazon.com: Keeping clients of OpenSearch and Elasticsearch compatible' with open source](https://aws.amazon.com/blogs/opensource/keeping-clients-of-opensearch-and-elasticsearch-compatible-with-open-source) [COMMUNITY-TOOL] - [medium: Logging with EFK - Pratyush Mathur](https://medium.com/@pratyush.mathur/logging-with-efk-1c2e131496d) [COMMUNITY-TOOL] - [medium.com/@CuriousLearner: Deploying EFK stack on Kubernetes](https://medium.com/@CuriousLearner/deploying-efk-stack-on-kubernetes-c25ba2682c99) [COMMUNITY-TOOL] - [medium.com/@tech_18484: Simplifying Kubernetes Logging with EFK Stack](https://medium.com/@tech_18484/simplifying-kubernetes-logging-with-efk-stack-158da47ce982) [COMMUNITY-TOOL] - [logz.io: A Beginnerโ€™s Guide to Logstash Grok](https://logz.io/blog/logstash-grok) [COMMUNITY-TOOL] - [logz.io: Grok Pattern Examples for Log Parsing](https://logz.io/blog/grok-pattern-examples-for-log-parsing) [COMMUNITY-TOOL] - [devops.com: The Fallacy of Continuous Integration, Delivery and Testing](https://devops.com/the-fallacy-of-continuous-integration-delivery-and-testing) [COMMUNITY-TOOL] - [dzone.com: The Keys to Performance Tuning and Testing](https://dzone.com/articles/the-keys-to-performance-tuning-and-testing) [COMMUNITY-TOOL] - [dzone.com: Performance Patterns in Microservices-Based Integrations](https://dzone.com/articles/performance-patterns-in-microservices-based-integr-1) [COMMUNITY-TOOL] - [How to read a Thread Dump](https://dzone.com/articles/how-to-read-a-thread-dump) [COMMUNITY-TOOL] - [dzone: 8 Options for Capturing Thread Dumps](https://dzone.com/articles/how-to-take-thread-dumps-7-options) [COMMUNITY-TOOL] - [blog.arkey.fr: Using JDK FlightRecorder and JDK Mission Control](https://blog.arkey.fr/2020/06/28/using-jdk-flight-recorder-and-jdk-mission-control) [COMMUNITY-TOOL] - [developers.redhat.com: Troubleshooting java applications on openshift](https://developers.redhat.com/blog/2017/08/16/troubleshooting-java-applications-on-openshift) [COMMUNITY-TOOL] - [VisualVM: JVisualVM to an Openshift pod](https://fedidat.com/250-jvisualvm-openshift-pod) [COMMUNITY-TOOL] - [redhat.com: How do I analyze a Java heap dump?](https://access.redhat.com/solutions/18301) [COMMUNITY-TOOL] - [Microservice Observability with Distributed Tracing: **OpenTelemetry.io**' ๐ŸŒŸ](https://opentelemetry.io) [COMMUNITY-TOOL] - [**zipkin.io**](https://zipkin.io) [COMMUNITY-TOOL] - [**OpenTracing.io**](https://opentracing.io) [COMMUNITY-TOOL] - [awkwardferny.medium.com: Setting up Distributed Tracing in Kubernetes with' OpenTracing, Jaeger, and Ingress-NGINX](https://awkwardferny.medium.com/setting-up-distributed-tracing-with-opentelemetry-jaeger-in-kubernetes-ingress-nginx-cfdda7d9441d) [COMMUNITY-TOOL] - [ploffay.medium.com: Five years evolution of open-source distributed tracing' ๐ŸŒŸ](https://ploffay.medium.com/five-years-evolution-of-open-source-distributed-tracing-ec1c5a5dd1ac) [COMMUNITY-TOOL] - [signadot.com: Sandboxes in Kubernetes using OpenTelemetry](https://www.signadot.com/blog/sandboxes-in-kubernetes-using-opentelemetry) [COMMUNITY-TOOL] - [Medium: Distributed Tracing and Monitoring using OpenCensus](https://medium.com/@rghetia/distributed-tracing-and-monitoring-using-opencensus-fe5f6e9479fb) [COMMUNITY-TOOL] - [Dzone: Zipkin vs. Jaeger: Getting Started With Tracing](https://dzone.com/articles/zipkin-vs-jaeger-getting-started-with-tracing) [COMMUNITY-TOOL] - [opensource.com: Distributed tracing in a microservices world](https://opensource.com/article/18/9/distributed-tracing-microservices-world) [COMMUNITY-TOOL] - [opensource.com: 3 open source distributed tracing tools](https://opensource.com/article/18/9/distributed-tracing-tools) [COMMUNITY-TOOL] - [thenewstack.io: Tracing: Why Logs Arenโ€™t Enough to Debug Your Microservices' ๐ŸŒŸ](https://thenewstack.io/tracing-why-logs-arent-enough-to-debug-your-microservices) [COMMUNITY-TOOL] - [github.com/open-telemetry/opentelemetry-operator](https://github.com/open-telemetry/opentelemetry-operator) โญ 1696 [COMMUNITY-TOOL] - [medium.com/@magstherdev: OpenTelemetry Operator](https://medium.com/@magstherdev/opentelemetry-operator-d3d407354cbf) [COMMUNITY-TOOL] - [thenewstack.io: OpenTelemetry Gaining Traction from Companies and Vendors](https://thenewstack.io/opentelemetry-gaining-traction-from-companies-and-vendors) [COMMUNITY-TOOL] - [thenewstack.io: How OpenTelemetry Works with Kubernetes](https://thenewstack.io/how-opentelemetry-works-with-kubernetes) [COMMUNITY-TOOL] - [medium.com/@bijit211987: Grafana with OpenTelemetry, Vendor-neutral and' open-source approach](https://medium.com/@bijit211987/grafana-with-opentelemetry-vendor-neutral-and-open-source-approach-ab4bc08f67e9) [COMMUNITY-TOOL] - [medium: Jaeger VS OpenTracing VS OpenTelemetry](https://medium.com/jaegertracing/jaeger-and-opentelemetry-1846f701d9f2) [COMMUNITY-TOOL] - [medium: Using Jaeger and OpenTelemetry SDKs in a mixed environment with' W3C Trace-Context](https://medium.com/jaegertracing/jaeger-clients-and-w3c-trace-context-c2ce1b9dc390) [COMMUNITY-TOOL] - [thenewstack.io: Jaeger vs. Zipkin: Battle of the Open Source Tracing Tools](https://thenewstack.io/jaeger-vs-zipkin-battle-of-the-open-source-tracing-tools) [COMMUNITY-TOOL] - [Grafana Tempo](https://github.com/grafana/tempo) โญ 5276 [ENTERPRISE-STABLE] - [opensource.com: Get started with distributed tracing using Grafana Tempo](https://opensource.com/article/21/2/tempo-distributed-tracing) [COMMUNITY-TOOL] - [Azure App Service Auto-Heal: Capturing Relevant Data During Performance' Issues](https://techcommunity.microsoft.com/blog/appsonazureblog/azure-app-service-auto-heal-capturing-relevant-data-during-performance-issues/4390351) [COMMUNITY-TOOL] - [APM in wikipedia](https://en.wikipedia.org/wiki/Application_performance_management) [COMMUNITY-TOOL] - [github.com/antonarhipov/awesome-apm: Awesome APM](https://github.com/antonarhipov/awesome-apm) [COMMUNITY-TOOL] - [dzone.com: APM Tools Comparison](https://dzone.com/articles/apm-tools-comparison-which-one-should-you-choose) [COMMUNITY-TOOL] - [dzone.com: Java Performance Monitoring: 5 Open Source Tools You Should Know](https://dzone.com/articles/java-performance-monitoring-5-open-source-tools-you-should-know) [COMMUNITY-TOOL] - [datadoghq.com](https://www.datadoghq.com) [COMMUNITY-TOOL] - [Mininimum elasticsearch requirement is 6.2.x or higher](https://www.elastic.co/support/matrix#matrix_compatibility) [COMMUNITY-TOOL] - [Elastic APM Server Docker image](https://github.com/sls-dev1/openshift-elastic-apm-server) [COMMUNITY-TOOL] - [Monitoring Java applications with Elastic: Getting started with the Elastic' APM Java Agent](https://www.elastic.co/blog/monitoring-java-applications-and-getting-started-with-the-elastic-apm-java-agent) [COMMUNITY-TOOL] - [Jenkins pipeline shared library for the project Elastic APM ๐ŸŒŸ](https://github.com/elastic/apm-pipeline-library) โญ 11 [COMMUNITY-TOOL] - [bqstack.com: Monitoring Application using Elastic APM](https://bqstack.com/b/detail/109) [COMMUNITY-TOOL] - [Successful Kubernetes Monitoring โ€“ Three Pitfalls to Avoid](https://www.dynatrace.com/news/blog/successful-kubernetes-monitoring-3-pitfalls-to-avoid) [COMMUNITY-TOOL] - [Tutorial: Guide to automated SRE-driven performance engineering ๐ŸŒŸ](https://www.dynatrace.com/news/blog/guide-to-automated-sre-driven-performance-engineering-analysis) [COMMUNITY-TOOL] - [dynatrace.com: 4 steps to modernize your IT service operations with Dynatrace](https://www.dynatrace.com/news/blog/4-steps-to-modernize-your-it-service-operations-with-dynatrace) [COMMUNITY-TOOL] - [dynatrace.com: Analyze all AWS data in minutes with Amazon CloudWatch Metric' Streams available in Dynatrace](https://www.dynatrace.com/news/blog/amazon-cloudwatch-metric-streams-launch-partnership) [COMMUNITY-TOOL] - [dynatrace.com: New Dynatrace Operator elevates cloud-native observability' for Kubernetes](https://www.dynatrace.com/news/blog/new-dynatrace-operator-elevates-cloud-native-observability-for-kubernetes) [COMMUNITY-TOOL] - [dynatrace.com: How to collect Prometheus metrics in Dynatrace](https://www.dynatrace.com/news/blog/how-to-collect-prometheus-metrics-in-dynatrace) [COMMUNITY-TOOL] - [dynatrace.com: Automatic connection of logs and traces accelerates AI-driven' cloud analytics](https://www.dynatrace.com/news/blog/automatic-connection-of-logs-and-traces-accelerates-ai-driven-cloud-analytics) [COMMUNITY-TOOL] - [devops.com: Dynatrace Advances Application Environments as Code](https://devops.com/dynatrace-advances-application-environments-as-code) [COMMUNITY-TOOL] - [thenewstack.io: Serverless Needs More Observability Tools](https://thenewstack.io/serverless-needs-more-observability-tools) [COMMUNITY-TOOL] - [Apache Beam](https://beam.apache.org) [COMMUNITY-TOOL] - [Krossboard](https://krossboard.app) [COMMUNITY-TOOL] - [Krossboard: A centralized usage analytics approach for multiple Kubernetes](https://itnext.io/in-search-of-converged-usage-analytics-for-multiple-managed-kubernetes-c5108cb7f0e1) [COMMUNITY-TOOL] - [cloudbees.com: Automated Build and Deploy Feedback Using Jenkins and Instana' ๐ŸŒŸ](https://www.cloudbees.com/blog/automated-build-deploy-feedback-using-instana) [COMMUNITY-TOOL] - [Netdata](https://github.com/netdata/netdata) โญ 78906 [DE FACTO STANDARD] - [PM2](https://github.com/Unitech/pm2) โญ 43173 [DE FACTO STANDARD] - [Huginn](https://github.com/huginn/huginn) โญ 49312 [DE FACTO STANDARD] - [OS Query](https://github.com/osquery/osquery) โญ 23262 [DE FACTO STANDARD] - [Glances](https://github.com/nicolargo/glances) โญ 32615 [DE FACTO STANDARD] - [TDengine](https://github.com/taosdata/TDengine) โญ 24860 [DE FACTO STANDARD] - [stackpulse.com: Automated Kubernetes Pod Restarting Analysis with StackPulse](https://stackpulse.com/blog/automated-kubernetes-pod-restarting-analysis-with-stackpulse) [COMMUNITY-TOOL] - [Checkly](https://www.checklyhq.com) [COMMUNITY-TOOL] - [hashicorp.com: Monitoring as Code with Terraform Cloud and Checkly](https://www.hashicorp.com/blog/monitoring-as-code-with-terraform-cloud-and-checkly) [COMMUNITY-TOOL] - [network-king.net: IoT use in healthcare grows but has some pitfalls](https://network-king.net/iot-use-in-healthcare-grows-but-has-its-pitfalls) [COMMUNITY-TOOL] - [Zebrium](https://www.zebrium.com) [COMMUNITY-TOOL] - [louislam/uptime-kuma](https://github.com/louislam/uptime-kuma) โญ 87087 [DE FACTO STANDARD] - [OpenTelemetry (OTel) vs Application Performance Monitoring (APM)](https://medium.com/@rahul.fiem/opentelemetry-otel-vs-application-performance-monitoring-apm-86ae829877cf) [COMMUNITY-TOOL] - [OOMKilled in Kubernetes: Understanding and Preventing Hidden Memory Leaks](https://unixarena.com/2025/04/oomkilled-in-kubernetes-the-hidden-memory-leaks-youre-missing.html) [COMMUNITY-TOOL] - [Grafana](https://grafana.com) [COMMUNITY-TOOL] - [SigNoz: Open source Application Performance Monitoring (APM) & Observability' tool ๐ŸŒŸ](https://github.com/SigNoz/signoz) โญ 26999 [DE FACTO STANDARD] - [Prometheus JMX Exporter ๐ŸŒŸ](https://github.com/prometheus/jmx_exporter) โญ 3305 [ENTERPRISE-STABLE] - [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) โญ 7050 [ENTERPRISE-STABLE] ## Cloud Infrastructure ### Service Mesh #### Istio Mesh - [Istio.io](https://istio.io) [EN CONTENT] [ADVANCED LEVEL] [DE FACTO STANDARD] โ€” The premier open-source service mesh providing advanced traffic management, end-to-end security, and granular observability. Uses Envoy proxies (via sidecars or Ambient mode) to secure and manage microservice fabrics. ## Cloud Native Infrastructure ### Observability #### Distributed Tracing ##### Jaeger Platform - [jaegertracing.io](https://www.jaegertracing.io) [DOCUMENTATION] [DE FACTO STANDARD] [ENTERPRISE-STABLE] โ€” The official gateway for Jaeger, a CNCF-graduated distributed tracing platform. Essential for microservice architectures to monitor transactions, perform root-cause analysis, optimize performance bottlenecks, and visualize complex request propagation paths. #### Log Analysis ##### Visualization Tools - [Kibana](https://www.elastic.co/kibana) [DOCUMENTATION] [DE FACTO STANDARD] [ENTERPRISE-STABLE] โ€” The foundational visualization and management interface for the Elastic Stack. Enables operators to search, index, analyze, and construct real-time security dashboards and log analysis patterns for high-throughput microservice applications. ## Cloud Native Languages ### Java #### Performance Tuning - [tier1app.com](https://tier1app.com) [EN CONTENT] [ENTERPRISE-STABLE] โ€” A dedicated APM tool for analyzing Java thread dumps and performance. Provides automated diagnostics for thread contention and deadlocks to optimize JVM application responsiveness. - [fastthread.io](https://fastthread.io) [EN CONTENT] [DE FACTO STANDARD] [ENTERPRISE-STABLE] โ€” Industrial-grade online Java thread dump analyzer that uses AI diagnostics to identify CPU spikes, thread leaks, and deadlock patterns. Essential for post-mortem analysis of containerized JVM workloads. - [gceasy.io](https://gceasy.io) [EN CONTENT] [ADVANCED LEVEL] [DE FACTO STANDARD] [ENTERPRISE-STABLE] โ€” Machine-learning powered JVM Garbage Collection log analyzer. Automates the detection of memory leaks, GC pauses, and heap sizing misconfigurations, offering actionable recommendations for optimization. - [heaphero.io](https://heaphero.io) [EN CONTENT] [ADVANCED LEVEL] [ENTERPRISE-STABLE] โ€” An automated cloud-based JVM heap dump analyzer built to parse large memory dumps quickly. Detects memory leaks and optimizes data structure footprints to resolve OutOfMemoryError crashes. ## Event-Driven Architecture ### Apache Kafka #### Tooling and UI - [Kafdrop โ€“ Kafka Web UI ๐ŸŒŸ](https://github.com/obsidiandynamics/kafdrop) โญ 6135 [DE FACTO STANDARD] [ENTERPRISE-STABLE] โ€” Curator Insight: Highly popular, lightweight web UI for monitoring and managing Apache Kafka. Live Grounding: Renders cluster info, brokers, topics, partition offsets, consumer group lag, and allows active JSON/protobuf message payload inspection. ## Infrastructure Operations ### Sysadmin Toolsets #### Resource Curation ##### Awesome Lists - [Awesome Sysadmin](https://github.com/awesome-foss/awesome-sysadmin) โญ 33981 [DE FACTO STANDARD] โ€” An incredibly rich curation containing production-grade open source utilities, control planes, networking layers, and security mechanisms used daily by systems architects and site reliability engineers. ## Observability (1) ### Telemetry Standards #### OpenTelemetry vs Prometheus - [Prometheus and OpenTelemetry Compatibility Issues](https://thenewstack.io/prometheus-and-opentelemetry-just-couldnt-get-along) [ADVANCED LEVEL] [COMMUNITY-TOOL] โ€” An informative look at the historical data model incompatibilities between Prometheus and OpenTelemetry (OTel). It details the industry efforts to reconcile standard Prometheus structures with the broader OTel landscape. ## Observability and Performance ### Kubernetes Internals #### Resource Management - [The Hidden CPU Throttling Crisis in Kubernetes Clusters](https://www.kubenatives.com/p/the-hidden-cpu-throttling-crisis) [EN CONTENT] [ADVANCED LEVEL] [COMMUNITY-TOOL] โ€” An in-depth analysis exposing the silent threat of CPU throttling inside Kubernetes clusters caused by rigid CFS quota management. Demonstrates how microservices suffer latency spikes even with low aggregate CPU consumption. ### Performance Testing #### HTTP Benchmarking - [blog.cloud-mercato.com: New HTTP benchmark tool **pycurlb**](https://blog.cloud-mercato.com/new-http-benchmark-tool-pycurlb) [EN CONTENT] [ADVANCED LEVEL] [COMMUNITY-TOOL] โ€” A deep-dive introducing `pycurlb`, a fast performance tool wrapping libcurl for rapid HTTP request benchmarking in Python. Explores real-world performance results and technical comparisons. ## Operations and Reliability ### Observability and Monitoring #### Foundations - [Monitoring Distributed Systems - Google SRE Book](https://sre.google/sre-book/monitoring-distributed-systems) [ADVANCED LEVEL] [DOCUMENTATION] [DE FACTO STANDARD] โ€” The industry-standard chapter from Google's SRE book detailing the implementation of distributed systems monitoring. It defines the 'Four Golden Signals'โ€”latency, traffic, errors, and saturationโ€”providing practical blueprints to prevent alert fatigue and build actionable dashboard designs. --- ๐Ÿ’ก **Explore Related:** [Mkdocs](./mkdocs.md) | [Cheatsheets](./cheatsheets.md) | [Git](./git.md)