13 KiB
Site Reliability Engineering (SRE)
SRE
-
==sre.google: What is Site Reliability Engineering (SRE)?== 🌟
-
cloud.google.com: SRE vs. DevOps: competing standards or close friends?
-
overops.com: DevOps vs. SRE: What’s the Difference Between Them, and Which One Are You?
-
devops.com: SRE vs. DevOps vs. Cloud Native: The Server Cage Match
-
devops.com: Site Reliability Engineering 101: DevOps Versus SRE
-
dzone: SRE vs. DevOps: SRE Is to DevOps What Scrum Is to Agile
-
Google: What is Site Reliability Engineering (SRE)? SRE is what you get when you treat operations as if it’s a software problem. Our mission is to protect, provide for, and progress the software and systems behind all of Google’s public services — Google Search, Ads, Gmail, Android, YouTube, and App Engine, to name just a few — with an ever-watchful eye on their availability, latency, performance, and capacity.
-
opensource.com: What is an SRE and how does it relate to DevOps? The SRE role is common in large enterprises, but smaller businesses need it, too.
-
thenewstack.io: Where Site Reliability Engineering Overlaps with DevOps
-
openshift.com: From Ops to SRE - Evolution of the OpenShift Dedicated Team
-
kelda.io: Why SREs Should be Responsible for Development Environments
-
youtube: Platform9’s Madhura Maskasky says observability is also essential for diagnosing and debugging in order for SREs to "get to the root cause quickly enough so that you can feed that back to the development teams." 🌟 Debugging remains complex. Debugging in "a world of microservices is a very difficult task," requiring the identification of which specific part in a microservices deployment must be fixed, says Platform9’s Madhura Maskasky.
- "What's happening to the system administrator or is the system administrator becoming an SRE? Are they going into different roles? Are they taking multiple roles? How do they play a part in ensuring that reliability in these new roles?"
- "The way Google defines SRE is that an SRE by nature needs to be someone who develops or writes code 50% of the time and only remaining 50% of the time they do the traditional ops/operations and this is because they want to do more through automation as part of the role of the requirements of the SRE himself, so that you can run apps that can serve billions of requests but that are still handled by a few dozens of SREs."
- "Suddenly the role for SRE gets democratized and distributed among different roles (developers included)".
- "Debugging remains complex. Debugging in "a world of microservices is a very difficult task", requiring the identification of which specific part in a microservices deployment must be fixed"
- Observability is also essential for diagnosing and debugging in order for SREs to "get to the root cause quickly enough so that you can feed that back to the development teams."
-
hernan-david-hd.medium.com: Breaking down SRE/DevOps into 5 key areas
-
itprotoday.com: Why Site Reliability Engineering Is Key to Modern DevOps Among the hottest areas of growth in DevOps is the emerging field of site reliability engineering as organizations look to bake reliability into the earliest stages of the software development cycle.
-
stackpulse.com: Managing Reliability for Monoliths vs. Microservices: The Challenges for SREs
-
stackpulse.com: Managing Reliability for Monoliths vs. Microservices: Best Practices for SREs
-
cloud.google.com: SRE at Google: Our complete list of CRE life lessons 🌟
-
circonus.com: Monitoring for Success: What All SREs Need to Know
-
infracloud.io: Site Reliability Engineering (SRE) Best Practices
-
stackpulse.com: No, SRE Is Not the New DevOps – Unless It Is
-
youtube: Viktor Farcic - What is the difference between SRE and DevOps?
-
dzone: Remote server management - Common architectural elements
-
dzone: Upcoming Trends in DevOps and SRE in 2021 🌟 DevOps and SRE are domains with rapid growth and frequent innovations. With this blog you can explore the latest trends in DevOps, SRE and stay ahead of the curve. The following trends are most likely to have a lasting impact in the field of DevOps and SRE:
- AIOps and Self-Healing Platforms
- Service Meshes
- Low-code DevOps
- GitOps
- DevSecOps
-
dzone: SRE vs. DevOps: What are the Differences? SRE and DevOps are closely related concepts with some important distinctions between both, and many businesses can benefit from embracing both of them.
-
thenewstack.io: How the SRE Experience Is Changing with Cloud Native 🌟 From Firefighting to Prevention for SREs. Empower Developers with Self-Service. Facilitate Developer Autonomy
Site Reliability Engineer (SRE) team Developers Operations team Provide and teach effective use of platform tooling to empower developers to be self-sufficient Treat SREs as application operation partners, not only as first responders to incidents Provide self-service platform deployment and observability, and enable visibility into ramifications of actions Document clear escalation paths for developers struggling in production Turn to ops teams for the “paved path” or centralized developer control plane Provide opinionated “paved path” platform or developer control plane (DCP), but allow developers to swap platform components if they also want to be accountable -
infoq.com: Observing and Understanding Failures: SRE Apprentices
-
thenewstack.io: Google SRE: Site Reliability Engineering at a Global Scale
-
==sre.google: sre-book - The Evolving SRE Engagement Model==
-
blogs.letusdevops.com: How much programming should I know for DevOps/SRE domain. And YES, you need to learn programming.
SRE Tools
- thenewstack.io: The Site Reliability Engineering Tool Stack
- getcortexapp.com: A guide to the best SRE tools
- thenewstack.io: The Best Site Reliability Engineering Tools in 2021
Service Level Objectives (SLO)
- SLOconf The first SLO Conference for Site Reliability Engineers
- thenewstack.io: Automate User Satisfaction with This GitOps-Friendly Spec for Service Level Objectives Organizations looking to tighten up their ops with some site reliability engineering (SRE) should take a look at the recently-released OpenSLO specification, a GitOps-friendly template for establishing Service Level Objectives (SLO) to specify and even enforce the range of reliability required (and afforded) for a system.
OpenSLO
- OpenSLO specification 🌟 The goal of this project is to provide an open specification for defining and interfacing with SLOs to allow for a common approach, giving a set vendor-agnostic solution to defining and tracking SLOs. Platform specific implementation details are purposefully excluded from the scope of this specification.
Validate Service-Level Objectives of REST APIs Using Iter8
Images
??? note "Click to expand!"
<center>
[](https://devops.com/sre-devops-cloud-native-server-cage-match/)
[](https://devops.com/site-reliability-engineering-101-devops-versus-sre/)
[](https://medium.com/@ta.abhisingh/agile-vs-devops-vs-sre-its-not-or-it-s-and-aa312904e577)
</center>
Videos
Click to expand!
Tweets
Click to expand!
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>Is it hard to find SREs? Dell: Developers do a good job as SREs because they know what exactly is happening. At the same time, we are also thinking about how we can have a developer rotation model too; essentially a rotation policy which is a learning process for us.
— The New Stack (@thenewstack) May 7, 2021