Shaun Abram
Java and Technology weblog
Book chapter summary: Postmortem Culture, from the SRE Book
I’m really enjoying reading the excellent “SRE Book“. Chapter 15 in particular, “Postmortem Culture: Learning from Failure”, really struck a chord with me. The following is a slightly summarized version of it.
TLDR: Failures are inevitable, especially in distributed systems. To learn from them, document in Postmortems, avoiding blame, and share the newly gained learnings across your org.
Tags: blameless, postmortems, RCA, rootcauseanalysis, sitereliabilityengineering, sre, summary, thesrebook
Talk summary: SRE principles by Tori Wieldt @ AWS re:Invent 2018
I caught a talk by Tori Wieldt at the New Relic booth at AWS re:Invent on “SRE principles”. Even though it was a short talk in the expo hall, rather than a formal scheduled one, it had a ton of good SRE material.
Tags: aws, newrelic, reinvent, reinvent2018, sitereliabilityengineering, sre, summary, Testing