RSS Feed Subscribe to RSS Feed

 

Creating an SRE team

If you wanted to build an SRE team at your company, how would you go about it? How would you structure it?

(more…)

Tags: , , ,

SRE Resources

The following are a list of SRE resources I’m finding useful. I will update it as I find more. The good news is that most of the books (including all 3 of the Google SRE books) are available for free download at https://landing.google.com/sre/books.

(more…)

Tags: , , , , ,

eBook Summary: What Is SRE?

What Is SRE? An Introduction to Site Reliability Engineering” (registration required but free), is an ebook by Kurt Andersen & Craig Sebenik, published by O’Reilly. The following is a summary (abridged copy and paste) of the parts I found most useful, with a few of my own notes. The original is about 9,000 words; this is about 2,000.

 

(more…)

Tags: , , ,

So you want to be a Manager?

Despite how satisfying and fun designing and writing software can be, building high performing teams can be even more so. The highs are higher and the lows are lower, but overall it can be an incredibly rewarding career and developing future leaders on your team is a key responsibility.

So, as a manager, how do you handle an individual contributor (IC) engineer on your team expressing interest in becoming a manager?

(more…)

Tags: , , , , ,

Measuring Developer Productivity

Most metrics for measuring developer productivity, such as lines of code or issues closed, are notoriously ineffective. But the research in the excellent State of Devops report shows that, rather than focusing on local metrics and individual developer performance, it is better to look at overall development and delivery practices. Specifically, there are metrics that predict and reflect a team’s ability to successfully deliver working software into production, including deployment frequency, and the mean time to restore service after an incident. This articles discusses why some metrics are useless, and takes a closer look at the recommendations in the 2019 State of Devops report.

(more…)

Tags: , , , ,

Report Summary: Accelerate State of DevOps 2019

This is an abridged version of The Accelerate State of DevOps Report 2019; essentially a cut and paste of the most salient parts. The original is about 18,000 words; This is about 2,500 words.

I highly recommend reading the original in its entirety, if you have time, and I’m a big fan of the Accelerate book too. As with all the other summaries I create, this just as as way to help me digest and understand an excellent article.

(more…)

Tags: , , , , , ,

File, Block and Object Storage

I was in an AWS class today, where they were talking about S3, and how it is “object storage”. But what does that mean? One way to explain it is to contrast it with other types of file storage, namely File and Block Storage.
(more…)

Tags: , , , , , ,

Is Apdex useful?

I’ve been trying to figure out what SLOs to define for some services recently, and wondering if Apdex is a useful metric. (See my previous post on the difference between SLIs, SLOs and SLAs)

(more…)

Tags: , , , , ,

SLI, SLO and SLA

What are SLIs, SLOs and SLAs? 

Service Level Indicators (SLIs) are metrics that you choose to measure the health and performance of your services. Service Level Objectives (SLOs) are the desired target for those indicators. Service Level Agreements (SLAs) build on this and include the consequences of not meeting those targets. All are fundamental to Site Reliability Engineering.

In this post, I’ll try to explain each in more detail, how they relate to each other, and some examples of each.

(more…)

Tags: , , , , , , , , , ,

SRE vs DevOps

I’m really enjoying the Seeking SRE book. Chapter 12 covers SRE vs DevOps; a community sourced compare and contrast type discussion.

My favorite description is from Thomas Limoncelli, who suggested that:

DevOps engineers focus on the SDLC pipeline with occasional responsibilities for production operations. SREs focus on production operations with occasional responsibilities for the SDLC pipeline.

(more…)

Tags: , , , ,

Blog post summary: Blameless PostMortems post by John Allspaw

The following is a slightly summarized version of this blog post from John Allspaw that I really like: Blameless PostMortems and a Just Culture 

(more…)

Tags: , , , ,

Book chapter summary: Postmortem Culture, from the SRE Book

I’m really enjoying reading the excellent “SRE Book“. Chapter 15 “Postmortem Culture: Learning from Failure” in particular, really struck a chord with me. The following is a slightly summarized version of it.

TLDR: Failures are inevitable, especially in distributed systems. To learn from them, document in Postmortems, avoiding blame, and share the newly gained learnings across your org.

(more…)

Tags: , , , , , , ,

How long to transfer a file of size X over a Y Mbps line?

How long does it take to transfer a file of size X over a Y Mbps line?

A 1 MB file over a 1 Mbps line takes 8 seconds. Not 1 second, due to MegaBytes over MegaBits

1 GB over 1 Mbps = 8192 secs (8*1024; 2.2755 Hours)

1 TB over 1 Mbps = 8388608 secs (2,330 Hours = 97 days)

So a good rule of thumb to remember is:

1Tb over 1Mbps takes ~100 days (8 * 1,000,000 secs)

There are also good online calculators for this. For example:

Convert Megabits Per Second to Terabytes Per Month

 

 

Tags: , ,

Git revert a merged branch

The article discusses how to revert changes that have already been pushed to your remote git branch, particularly reverting the changes that come from a branch merge.

(more…)

Tags: , , , ,

Don’t use “kill -9”

In the past, any time I wanted to stop an errant process on unix, I just used “kill -9”. By default. Without thinking about it much.

Then a colleague commented to me that you should never use kill -9. It terminates the process with no chance to shutdown in an orderly manner, and so can leave things in a bad state, such as corrupting files. “But what else am I supposed to do!?” I naively asked.

There are of course many other options for the kill command (see links below), but here are some alternatives you can try, in the order you may want to try them.

(more…)

Tags: , , ,