RSS Feed Subscribe to RSS Feed

 

SRE Resources

The following are a list of SRE resources I’m finding useful. I will update it as I find more. The good news is that most of the books (including all 3 of the Google SRE books) are available for free download at https://landing.google.com/sre/books.

 

The SRE Book

The book’s formal title is “Site Reliability Engineering: How Google Runs Production Systems”, but it is more commonly referred to as simply “The SRE book”. It is in many ways the SRE bible and I highly recommend it. That being said, it is very Google specific (as the title calls out) and not everyone is dealing with Google scale issues after all. It also covers a lot about tools developed in house and Google, and few open source alternatives. There is also a lot of duplication and repetition between some chapters, and some of it can frankly be a very dry read. Still, it is definitely worth reading. My suggestion is to skim and read the parts that are most relevant to you. You can find some of the highlights from the book here. I particularly liked

 

The SRE Workbook

This is a companion to the aforementioned SRE book. Specifically, it aims to “add more implementation detail to the principles outlined in” the SRE book. It also has more non-Google specific material, including for example, use cases from the New York Times and Spotify.

 

Seeking SRE

This book completes the trilogy of SRE books from O’Reilly. The chapters are contributed by engineers from different companies who apply SRE principles on real world problems. We covered this book for a bookclub where I work, and some of the chapters were very useful.

 

Chaos Engineering

This Chaos Engineering book (available for free download) was written by the “The Chaos team” at Netflix and one that I really enjoyed. For me, one of the most important concepts covered was “steady state”, i.e. knowing what “normal” looks like in your apps. And incredibly basic important concept, but it also  covers how to setup and run experiments in productions an, critically, control the blast radius. I summarized the book here.

 

What Is SRE?

“What Is SRE? An Introduction to Site Reliability Engineering” is a free ebook from O’Reilly. It is not as in depth as any of the resources above, but as a result makes for a gentler introduction to SRE. I summarized it here.

 

Finally, if you are interested in starting an SRE team, I wrote about some SRE team building specific resources here: http://www.shaunabram.com/creating-an-sre-team/

Tags: , , , , ,

Leave a Reply