RSS Feed Subscribe to RSS Feed

 

Blog post summary: Automating safe, hands-off deployments at AWS

AWS’s Clare Liguori wrote an excellent blog post on Automating safe, hands-off deployments.

This is a summary (1,700 words, vs 5,300 in the original) and mostly just a copy & paste of highlights. I have also skipped some of the sections that are at scales larger than most folks deal with (e.g. global releases across 26 regions!).

(more…)

Tags: , , , ,

Blog post summary: Domain-Oriented Microservice Architecture at Uber

Domain-Oriented Microservice Architecture at Uber” is a blog post on the Uber engineering blog. There were some comments about the post not giving credit to prior art, which I think is fair, but it is a useful post none the less. Uber provide an interesting approach to classifying and organizing their (2,200!) microservices, by using the concepts of Domains, Layers, Gateways and Extensions.

This is a shortened version here (1,200 words, vs 3,800 in the original), since I tend to learn by creating summaries, but it is mostly just a copy & paste, so check out the original with diagrams etc if you’re really interested.

(more…)

Tags: , ,

eBook Summary: What Is SRE?

What Is SRE? An Introduction to Site Reliability Engineering” (registration required but free), is an ebook by Kurt Andersen & Craig Sebenik, published by O’Reilly. The following is a summary (abridged copy and paste) of the parts I found most useful, with a few of my own notes. The original is about 9,000 words; this is about 2,000.

 

(more…)

Tags: , , ,

Report Summary: Accelerate State of DevOps 2019

This is an abridged version of The Accelerate State of DevOps Report 2019; essentially a cut and paste of the most salient parts. The original is about 18,000 words; This is about 2,500 words.

I highly recommend reading the original in its entirety, if you have time, and I’m a big fan of the Accelerate book too. As with all the other summaries I create, this just as as way to help me digest and understand an excellent article.

(more…)

Tags: , , , , , ,

Blog post summary: Blameless PostMortems post by John Allspaw

The following is a slightly summarized version of this blog post from John Allspaw that I really like: Blameless PostMortems and a Just Culture 

(more…)

Tags: , , , ,

Book chapter summary: Postmortem Culture, from the SRE Book

I’m really enjoying reading the excellent “SRE Book“. Chapter 15 “Postmortem Culture: Learning from Failure” in particular, really struck a chord with me. The following is a slightly summarized version of it.

TLDR: Failures are inevitable, especially in distributed systems. To learn from them, document in Postmortems, avoiding blame, and share the newly gained learnings across your org.

(more…)

Tags: , , , , , , ,

Talk summary: Realizing the Microservices Vision with Service Mesh by Arijit Mukherji

Some note on the talk “Fully Realizing the Microservices Vision with Service Mesh” by Arijit Mukherji of SignalFx at AWS re:Invent 2018 (DEV312)

Find the video at https://www.youtube.com/watch?v=eTHhsbKfpWg

(more…)

Tags: , , ,

Talk summary: SRE principles by Tori Wieldt @ AWS re:Invent 2018

I caught a talk by Tori Wieldt at the New Relic booth at AWS re:Invent on “SRE principles”. Even though it was a short talk in the expo hall, rather than a formal scheduled one, it had a ton of good SRE material.

(more…)

Tags: , , , , , , ,

Talk summary: Reactive DDD by Vaughn Vernon @ QCon2018

Some notes on the “Reactive DDD – When Concurrent Waxes Fluent” talk by Vaughn Vernon (author of Implementing Domain-Driven Design) at QCon 2018. (Currently I think you need to be logged in as a ticket holder to see the talk – I will post a link if it becomes public)

(more…)

Tags: , , , , ,

Talk summary: Chaos Engineering by Adrian Cockroft @ ChaosConf18

Title: Chaos Engineering – What is it, where did it come from, and where might it be going?

Speaker: Adrian Cockcroft (AWS VP Cloud Architecture Strategy)

Conference: Chaos Conference 2018 (http://chaosconf.io/)

Video: https://www.youtube.com/watch?v=cefJd2v037U

The following are some brief notes and slide summaries from Adrian’s keynote at ChaosConf 2018…

(more…)

Tags: , , ,

Book summary: Distributed Systems Observability

“Distributed Systems Observability” is a book from Cindy Sridharan (find her on twitter, and medium), available as a free download here (registration required). At a little over 30 pages and 8,000 words, it is not a difficult read, and I definitely recommend it.

 

 

 

(more…)

Tags: , , , , , , , ,

Book summary: Chaos Engineering

Chaos Engineering“Chaos Engineering” is a book from O’Reilly (free download), written by folks from the “The Chaos team” at Netflix. It is a GREAT read for anyone interested in resilience engineering. This post is one of my summaries, essentially a cut and paste of the most salient parts (the original is about 16,000 words; this is about 3,000), with some paraphrasing and merging/rewriting of sections for brevity.

(more…)

Tags: , , , , , ,

Talk summary: How Complex Systems Fail by Dr Richard Cook @ Velocity 2012

“How Complex Systems Fail” is a talk by Dr Richard Cook at Velocity 2012.

I’ve included a link to the video on YouTube below, and some of my key takeaway points.

(more…)

Tags: , , ,

Talk summary: Using Chaos to Build Resilient Systems by Tammy Butow @ QCon2018

Using Chaos to Build Resilient Systems” was a talk by given by Tammy Butow of Gremlin at QCon New York 2018 . I really enjoyed the talk, so wanted to summarize some of the key points of interest to me.

(more…)

Tags: , , , , , , ,

AWS Best Practices Architecting for the Cloud – Concise Summary

The following is a concise summary of Architecting for the Cloud: AWS Best Practices. The original is about 13,000 words; There is an abridged version of about 4,000 words, and this is an even more concise version, at about 1,500 words.

(more…)

Tags: ,