RSS Feed Subscribe to RSS Feed

 

awk

I think of awk as a tool for searching, manipulating and reporting on text files, but it is in fact an entire programming language. Its basic function is to search files for lines that contain certain patterns, and perform specified actions on that line.

The name awk comes simply from the initials of its designers Aho, Weinberger and Kernighan.

The basic format of an awk command is:

awk pattern { action } file

Every line in ‘file’ matching the ‘pattern’ will have the ‘action’ performed.  Either the pattern or action are optional, but not both.
No pattern means every line is actioned.
No action defaults to print.

For example, given this file called example.txt:

the quick brown
brown fox jumped
jumped over these lazy
lazy dog

We could do

awk ‘/brown/ {print}’ example.txt

Pattern

Patterns control the execution of actions (also known as rules). An action is only executed when its pattern matches the input.

Patterns can be simple strings (as in the above example), or regular expressions (regex). For example:

awk /”fox$”/ example.txt #prints only lines ending in fox

Boolean expressions

awk /”fox|dog”/ example.txt #prints all lines containing fox OR dog

awk ‘/fox/ || /dog/’ example.txt #ditto

awk “/brown/ && /fox/” example.txt #prints only lines containing fox AND dog

awk “! /dog/” example.txt #prints only lines that do NOT contain dog

Default: If the pattern is omitted, every line is actioned. For example, this simply prints every line:

awk {print} example.txt

Actions

An action is simply one or more awk expressions, and must be wrapped in curly brackets: {}

Those expressions are where the full power of awk can take effect, and can include statements such as print and delete; control statement such as if, for, while, and do; you can also use variables and function calls.

Default: print is the default action if none is specified

Field references

When awk reads a line (sometime called a record), it is parsed into fields. By default, the field separator is a space. You can refer to each field value use $1 for the first, $2 for the second etc. $0 refers to the whole line.

You can use these field references in both the pattern and action sections. For example, this command would print the first ‘field’ of any line containing the string “the”:

awk ‘$0 ~ /the/ { print $1 }’ example.txt

Examples

1) Surround each line with a single quote and a trailing common:

Using single quotes in awk can be tricky, so we can use /x27 which is ascii for a single quote:

awk ‘{print “\x27” $0 “\x27,”}’ example.txt

The same effect can also be achieved by using a variable to define a single quote

awk -v q=”” ‘{print q $0 q”,”}’ example.txt

In both cases, the result should be

‘the quick brown’,
‘brown fox jumped’,
‘jumped over these lazy’,
‘lazy dog’,

2) Replace new lines with spaces

This one is useful when you copy from pdf and it puts every word on a new line!

awk ‘{printf “%s “,$0} END {print “”}’

Useful links

Tags: , , , ,

Leave a Reply