Shaun Abram
Technology and Leadership Blog
awk
I think of awk as a tool for searching, manipulating and reporting on text files, but it is in fact an entire programming language. Its basic function is to search files for lines that contain certain patterns, and perform specified actions on that line.
The name awk comes simply from the initials of its designers Aho, Weinberger and Kernighan.
The basic format of an awk command is:
awk pattern { action } file
Every line in ‘file’ matching the ‘pattern’ will have the ‘action’ performed. Either the pattern or action are optional, but not both.
No pattern means every line is actioned.
No action defaults to print.
For example, given this file called example.txt:
the quick brown
brown fox jumped
jumped over these lazy
lazy dog
We could do
awk ‘/brown/ {print}’ example.txt
Pattern
Patterns control the execution of actions (also known as rules). An action is only executed when its pattern matches the input.
Patterns can be simple strings (as in the above example), or regular expressions (regex). For example:
awk /”fox$”/ example.txt #prints only lines ending in fox
Boolean expressions
awk /”fox|dog”/ example.txt #prints all lines containing fox OR dog
awk ‘/fox/ || /dog/’ example.txt #ditto
awk “/brown/ && /fox/” example.txt #prints only lines containing fox AND dog
awk “! /dog/” example.txt #prints only lines that do NOT contain dog
Default: If the pattern is omitted, every line is actioned. For example, this simply prints every line:
awk {print} example.txt
Actions
An action is simply one or more awk expressions, and must be wrapped in curly brackets: {}
Those expressions are where the full power of awk can take effect, and can include statements such as print and delete; control statement such as if, for, while, and do; you can also use variables and function calls.
Default: print is the default action if none is specified
Field references
When awk reads a line (sometime called a record), it is parsed into fields. By default, the field separator is a space. You can refer to each field value use $1 for the first, $2 for the second etc. $0 refers to the whole line.
You can use these field references in both the pattern and action sections. For example, this command would print the first ‘field’ of any line containing the string “the”:
awk ‘$0 ~ /the/ { print $1 }’ example.txt
Examples
1) Surround each line with a single quote and a trailing common:
Using single quotes in awk can be tricky, so we can use /x27 which is ascii for a single quote:
awk ‘{print “\x27” $0 “\x27,”}’ example.txt
The same effect can also be achieved by using a variable to define a single quote
awk -v q=”‘” ‘{print q $0 q”,”}’ example.txt
In both cases, the result should be
‘the quick brown’,
‘brown fox jumped’,
‘jumped over these lazy’,
‘lazy dog’,
2) Replace new lines with spaces
This one is useful when you copy from pdf and it puts every word on a new line!
awk ‘{printf “%s “,$0} END {print “”}’
Useful links
Tags: awk, bash, grep, tools, unix