Notes
Powered by Gregarious (33)
Go to Post Index Blog Index
Subscribe Subscribe
Subscribe to RSS feed via Email Subscribe via Email
Sphere: Related Content
 

Gawk for dummies - Part III

Filed under Gawk, Programming.

Viewed 1456 times times.

 

 

Series table of contents:

  1. Gawk for dummies - Part I
  2. Gawk for dummies - Part II
  3. Gawk for dummies - Part III

Deep in the bowels of most UNIX based systems lies “gawk“, a little known command line application that can make your dealings with the ever pervasive text files much easier. We saw in Part I that Gawk looks at each file as if it were a flat database, divided in to several records, each subdivided in to fields and we covered the basic structure of a gawk script. In Part II we looked at how Records and Fields are defined and how for loops and if statements work. In this third and final post of the series I explain how gawk can be used to interact with the underlying operating system to simplify many tasks.


String manipulation and command execution

In Gawk you can easily direct the output to a string instead of the standard output by replacing “printf” with “sprintf” and assigning the returned value to a variable which can later be used. For example:

666
echo a b 5 | awk '{line=sprintf("%s-%02u",$1,$3); printf("\"%s\"\n",line);}'

would simply print “a-05″. One obvious and useful applications of this would be to craft different shell commands based on the input the script receives. These commands could then be printed out to a file or executed directly using the “system(command)” function that gawk provides. The next example illustrates this by moving each file to a different directory based on the file extension. If the file extension is unknown, it does nothing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/sw/bin/awk -f
 
BEGIN{
#split the record on dots and directory slashes
  FS="[./]+";
 
#List of known file types.
  dir["mp3"]="Music";
  dir["pdf"]="papers";
  dir["tex"]="docs"; 
}
{
  #ignore filetypes we don't know about
  if(dir[$NF]!="")
    {  
      # write the move command
      command=sprintf("mv \"%s\" \"%s/\"",$0,dir[$NF]);
 
      # print it out
      print command;
 
      # execute it
      system(command);
    }
}

If this script was saved as “sort.awk” we could do:

666
ls | ./sort.awk

to sort all the known files on to separate directories. It’s easy to see that his simple example can be extended to solve much more useful and complex problems.

Gawk will also let you split any string in to different pieces similarly to the way each record is split in to fields using the split function. This function takes three arguments, the string to split, the name of the array where to put the results, and, optionally, the separator to be used (it defaults to FS) and returns the number of parts in which the string was split (the equivalent of NF). For example, using

666
echo "a-05" | awk '{nf=split($0,fields,"-"); print fields[1],fields[2];}'

we could split the result of our earlier example in to the parts that were used to produce it.


Writing to pipes

Sometimes, it would be more useful to just pipe our output through an external filter instead of simply issuing commands to the underlying shell with system().Gawk allows for this with the use of external pipes, whose syntax is similar to the bash pipes you are probably accustomed to, just add a “|” followed by a command after any output function (print or printf for instance) and gawk will pipe it and output the final result. As a simple example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/sw/bin/awk -f
 
{
     #calculate and store the sine of the input
     value[$1]=sin($1);
}
END{
       # instruct sort to use numeric order on the second column.
       command = "sort -n -k 2";
 
       # print and sort
       for(i in value)
                print i,value[i] | command;
}

would output x, sin(x) pairs in order of increasing sin(x). This feature, in conjunction with what I described in the previous section allows you to use gawk to glue together different command line tools to achieve very complex results using a relatively simple sintax.

The mini tutorial that this post concludes barely scratches the surface on the power that Gawk places at your finger tips. It simply explains the features I used everyday in order to simplify my life. For a complete information on all of gawk’s possibilities, I refer you to the official gawk user’s guide

If anybody has any questions, comments or examples that they would like to share, please leave a message in the comments section. I would lov to hear it.


Sphere: Related Content




Leave a Reply




 

© Copyright 2004 Bruno Goncalves - All rights reserved

Valid XhtmlValid CSS

Socialized through Gregarious 33
Close
E-mail It