Java and Technology weblog
sed (stream editor) is an simple but incredibly versatile command line tool that parses and transforms text. It is line-oriented in that it reads the text line by line, transforms it, and outputs the result.
For example, this sed command would replace all occurrences of the text “white” with “black”:
sed reads text on a line by line basis and performs an operation on it, usually extracting or replacing text snippets. In this case, the s prefix means substitute, and the g suffix means global. Other example usages include:
|sed s/.*TRIMTOHERE//||Trim all text before TRIMTOHERE|
|sed 's/\(TRIMFROMHERE\).*/\1/'||Trim all text after TRIMFROMHERE|
|sed '/^[ \t]*$/d'||remove all blank lines (see notes on \t below)|
|sed -e "s/^/PREFIX/" -e "s/$/SUFFIX/"|
sed 's/^/PREFIX/; s/$/POSTFIX/'
|Add a prefix and postfix to each line|
|sed 's/\(abc\)/ABC/g'||replace all occurrences of abc with ABC|
|sed 's/[ \t]*$/POSTFIX/'||Removing trailing whitespace
Note that I leave the POSTFIX part in just to visually confirm all whitespace really is gone. Just remove when you are confident leaving:
sed 's/[ \t]*$//' file.txt
See notes on \t below.
Note the \t is just a placeholder – I had to do Ctrl-V then tab (as mentioned here).
The basic syntax is:
sed [OPTIONS…] [SCRIPT] [INPUTFILE…]
You can think of a sed script as the body of a loop. We loop through each line in the input text, executing the sed script each time.
sed scripts can be very terse because both the the loop itself and the loop variable (index, or current line number) are both implicit. For example, this sed script would loop through the first 3 lines of the input, print them, and quit:
sed 3q file.txt
Note that everything other than the sed command name itself is optional. You could just run sed with no options, script or input file! Doing so would read from the standard input, and use the default script, which simply prints the input. Not very useful, but valid.
OPTIONS are, obviously, optional but some useful options include:
-e script or –expression=script
-e is optional, unless you want to specify more than one set of sed commands, and even then you can usually achieve the same result using semi-colons.
-e means execute the script given on the command line. This is the default. For example, these 2 commands are equivalent and both replace the text brown with white:
sed -s/brown/white/g example.txt
sed -e s/brown/white/g example.txt
So why would use use -e if it is the default? The answers is if you want to execute multiple command line scripts. For example, if you wanted to delete the first, last and all blank lines:
sed -e ‘1d’ -e ‘$d’ -e ‘/^$/d’ file.txt
The same effect can be achieved using a semicolon separator:
sed ‘1d;$d;/^$/d;’ file.txt
I think there may be some circumstances when you may be forced to use the -e approach (see here) instead of ;
-f script-file or –file=script-file
read the sed script from a file instead of the command line. This is basically the alternative to -e
-i or –in-place
modifies the file in place, rather than output the results to standard output. It must be supplied with a suffix, which is used to create the backup file of the original. For example
sed -i”.bak” “4d” file.txt
Would leave with a file file.txt with its 4th line deleted, and a file file.txt.bak that is an unmodified copy of the original.
A sed script is usually made up of a combination of line selectors and commands.
Line selectors can be expressed in various formats including a line number, a string that must be matched (e.g. the string white in our very first example), a regex expression, or $ (last line).
The substitute command is probably the one you will use most. Its syntax is:
delete the pattern space and start the next cycle (or loop iteration)
exit sed without processing any more commands
p (print) and n (no print)
p = Print out the pattern space (to the standard output). This is the default.
n = no print, or suppress the default output.
Note that somewhat confusingly, p and n are almost always used together. n to stop everything being printed, and p to print only the line you are interested in. For example:
sed -n ‘5,10p’ # print only lines 5 through 10
sed -n ‘5p;10p’ # print only lines 5 and 10
sed -n ‘$p’ #prints the last line of the file (although tail -1 is probably a better approach!)
Pattern and hold space
Finally you may hear the term pattern space (or pattern buffer) when reading about sed. Pattern space sounds complicated but is simply where sed reads a line in to before running commands against it, and then usually the contents of pattern space are printed.
There is also a hold space (or hold buffer), which refers to another chunk of memory used to hold data over multiple iterations. I have’t used this much yet.
INPUTFILE is optional; if ommitted sed will read from the standard input. You can specify more than input file.
Most of the examples here omit the input file for brevity.
Note that for simple transformations, the unix tool tr can be useful too.
For example, to replace all commas with newlines: tr , ‘\n’ < file
- GNU sed user’s manual
- Useful one-line scripts for sed
- World’s best introduction to sed (allegedly)
- Chart of similar operations with sed and awk