Gawk for dummies - Part III
Filed under Gawk, Programming.
Viewed 1456 times times.
Series table of contents:
- Gawk for dummies - Part I
- Gawk for dummies - Part II
- Gawk for dummies - Part III
Deep in the bowels of most UNIX based systems lies “gawk“, a little known command line application that can make your dealings with the ever pervasive text files much easier. We saw in Part I that Gawk looks at each file as if it were a flat database, divided in to several records, each subdivided in to fields and we covered the basic structure of a gawk script. In Part II we looked at how Records and Fields are defined and how for loops and if statements work. In this third and final post of the series I explain how gawk can be used to interact with the underlying operating system to simplify many tasks.
String manipulation and command execution
In Gawk you can easily direct the output to a string instead of the standard output by replacing “printf” with “sprintf” and assigning the returned value to a variable which can later be used. For example:
666 | echo a b 5 | awk '{line=sprintf("%s-%02u",$1,$3); printf("\"%s\"\n",line);}' |
would simply print “a-05″. One obvious and useful applications of this would be to craft different shell commands based on the input the script receives. These commands could then be printed out to a file or executed directly using the “system(command)” function that gawk provides. The next example illustrates this by moving each file to a different directory based on the file extension. If the file extension is unknown, it does nothing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #!/sw/bin/awk -f BEGIN{ #split the record on dots and directory slashes FS="[./]+"; #List of known file types. dir["mp3"]="Music"; dir["pdf"]="papers"; dir["tex"]="docs"; } { #ignore filetypes we don't know about if(dir[$NF]!="") { # write the move command command=sprintf("mv \"%s\" \"%s/\"",$0,dir[$NF]); # print it out print command; # execute it system(command); } } |
If this script was saved as “sort.awk” we could do:
666 | ls | ./sort.awk |
to sort all the known files on to separate directories. It’s easy to see that his simple example can be extended to solve much more useful and complex problems.
Gawk will also let you split any string in to different pieces similarly to the way each record is split in to fields using the split function. This function takes three arguments, the string to split, the name of the array where to put the results, and, optionally, the separator to be used (it defaults to FS) and returns the number of parts in which the string was split (the equivalent of NF). For example, using
666 | echo "a-05" | awk '{nf=split($0,fields,"-"); print fields[1],fields[2];}' |
we could split the result of our earlier example in to the parts that were used to produce it.
Writing to pipes
Sometimes, it would be more useful to just pipe our output through an external filter instead of simply issuing commands to the underlying shell with system().Gawk allows for this with the use of external pipes, whose syntax is similar to the bash pipes you are probably accustomed to, just add a “|” followed by a command after any output function (print or printf for instance) and gawk will pipe it and output the final result. As a simple example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #!/sw/bin/awk -f { #calculate and store the sine of the input value[$1]=sin($1); } END{ # instruct sort to use numeric order on the second column. command = "sort -n -k 2"; # print and sort for(i in value) print i,value[i] | command; } |
would output x, sin(x) pairs in order of increasing sin(x). This feature, in conjunction with what I described in the previous section allows you to use gawk to glue together different command line tools to achieve very complex results using a relatively simple sintax.
The mini tutorial that this post concludes barely scratches the surface on the power that Gawk places at your finger tips. It simply explains the features I used everyday in order to simplify my life. For a complete information on all of gawk’s possibilities, I refer you to the official gawk user’s guide
If anybody has any questions, comments or examples that they would like to share, please leave a message in the comments section. I would lov to hear it.

Blog Index
Subscribe via Email
