Go to:
Gentoo Home
Documentation
Forums
Lists
Bugs
Planet
Store
Wiki
Get Gentoo!
Gentoo's Bugzilla – Attachment 71786 Details for
Bug 110994
updated en/articles/l-awk2.xml
Home
|
New
–
[Ex]
|
Browse
|
Search
|
Privacy Policy
|
[?]
|
Reports
|
Requests
|
Help
|
New Account
|
Log In
[x]
|
Forgot Password
Login:
[x]
proposed fix
l-awk1.xml (text/plain), 11.68 KB, created by
Luca Martini
on 2005-10-31 03:56:45 UTC
(
hide
)
Description:
proposed fix
Filename:
MIME Type:
Creator:
Luca Martini
Created:
2005-10-31 03:56:45 UTC
Size:
11.68 KB
patch
obsolete
><?xml version='1.0' encoding="UTF-8"?> ><!-- $Header: /var/www/www.gentoo.org/raw_cvs/gentoo/xml/htdocs/doc/en/articles/l-awk1.xml,v 1.6 2005/10/09 17:13:23 rane Exp $ --> ><!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> > ><guide link="/doc/en/articles/l-awk1.xml" disclaimer="articles"> ><title>Awk by example, Part 1</title> > ><author title="Author"> > <mail link="drobbins@gentoo.org">Daniel Robbins</mail> ></author> > ><abstract> >Awk is a very nice language with a very strange name. In this first article of a >three-part series, Daniel Robbins will quickly get your awk programming skills >up to speed. As the series progresses, more advanced topics will be covered, >culminating with an advanced real-world awk application demo. ></abstract> > ><!-- The original version of this article was published on IBM developerWorks, >and is property of Westtech Information Services. This document is an updated >version of the original article, and contains various improvements made by the >Gentoo Linux Documentation team --> > ><version>1.3</version> ><date>2005-10-09</date> > ><chapter> ><title>An intro to the great language with the strange name</title> ><section> ><title>In defense of awk</title> ><body> > ><p> >In this series of articles, I'm going to turn you into a proficient awk coder. >I'll admit, awk doesn't have a very pretty or particularly "hip" name, and the >GNU version of awk, called gawk, sounds downright weird. Those unfamiliar with >the language may hear "awk" and think of a mess of code so backwards and >antiquated that it's capable of driving even the most knowledgeable UNIX guru to >the brink of insanity (causing him to repeatedly yelp "kill -9!" as he runs for >coffee machine). ></p> > ><p> >Sure, awk doesn't have a great name. But it is a great language. Awk is geared >toward text processing and report generation, yet features many well-designed >features that allow for serious programming. And, unlike some languages, awk's >syntax is familiar, and borrows some of the best parts of languages like C, >python, and bash (although, technically, awk was created before both python and >bash). Awk is one of those languages that, once learned, will become a key part >of your strategic coding arsenal. ></p> > ></body> ></section> ><section> ><title>The first awk</title> ><body> > ><p> >You should see the contents of your <path>/etc/passwd</path> file appear before >your eyes. Now, for an explanation of what awk did. When we called awk, we >specified <path>/etc/passwd</path> as our input file. When we executed awk, it >evaluated the print command for each line in <path>/etc/passwd</path>, in >order. All output is sent to stdout, and we get a result identical to catting ><path>/etc/pass</path>. ></p> > ><p> >Now, for an explanation of the { print } code block. In awk, curly braces are >used to group blocks of code together, similar to C. Inside our block of code, >we have a single print command. In awk, when a print command appears by itself, >the full contents of the current line are printed. ></p> > ><pre caption="Printing the current line"> >$ <i>awk '{ print $0 }' /etc/passwd</i> >$ <i>awk '{ print "" }' /etc/passwd</i> ></pre> > ><p> >In awk, the $0 variable represents the entire current line, so print and print >$0 do exactly the same thing. ></p> > ><pre caption="Filling the screen with some text"> >$ <i>awk '{ print "hiya" }' /etc/passwd</i> ></pre> > ></body> ></section> ><section> ><title>Multiple fields</title> ><body> > ><pre caption="print $1"> >$ <i>awk -F":" '{ print $1 $3 }' /etc/passwd</i> >halt7 >operator11 >root0 >shutdown6 >sync5 >bin1 ><comment>....etc.</comment> ></pre> > ><pre caption="print $1 $3"> >$ <i>awk -F":" '{ print $1 " " $3 }' /etc/passwd</i> ></pre> > ><pre caption="$1$3"> >$ <i>awk -F":" '{ print "username: " $1 "\t\tuid:" $3 }' /etc/passwd</i> >username: halt uid:7 >username: operator uid:11 >username: root uid:0 >username: shutdown uid:6 >username: sync uid:5 >username: bin uid:1 ><comment>....etc.</comment> ></pre> > ></body> ></section> ><section> ><title>External scripts</title> ><body> > ><pre caption="Sample script"> >BEGIN { FS=":" } >{ print $1 } ></pre> > ><p> >The difference between these two methods has to do with how we set the field >separator. In this script, the field separator is specified within the code >itself (by setting the FS variable), while our previous example set FS by >passing the -F":" option to awk on the command line. It's generally best to set >the field separator inside the script itself, simply because it means you have >one less command line argument to remember to type. We'll cover the FS variable >in more detail later in this article. ></p> > ></body> ></section> ><section> ><title>The BEGIN and END blocks</title> ><body> > ><p> >Normally, awk executes each block of your script's code once for each input >line. However, there are many programming situations where you may need to >execute initialization code before awk begins processing the text from the input >file. For such situations, awk allows you to define a BEGIN block. We used a >BEGIN block in the previous example. Because the BEGIN block is evaluated before >awk starts processing the input file, it's an excellent place to initialize the >FS (field separator) variable, print a heading, or initialize other global >variables that you'll reference later in the program. ></p> > ><p> >Awk also provides another special block, called the END block. Awk executes this >block after all lines in the input file have been processed. Typically, the END >block is used to perform final calculations or print summaries that should >appear at the end of the output stream. ></p> > ></body> ></section> ><section> ><title>Regular expressions and blocks</title> ><body> > ><pre caption="Regular expressions and blocks"> >/foo/ { print } >/[0-9]+\.[0-9]*/ { print } ></pre> > ></body> ></section> ><section> ><title>Expressions and blocks</title> ><body> > ><pre caption="fredprint"> >$1 == "fred" { print $3 } ></pre> > ><pre caption="root"> >$5 ~ /root/ { print $3 } ></pre> > ></body> ></section> ><section> ><title>Conditional statements</title> ><body> > ><pre caption="if"> >{ > if ( $5 ~ /root/ ) { > print $3 > } >} ></pre> > ><p> >Both scripts function identically. In the first example, the boolean expression >is placed outside the block, while in the second example, the block is executed >for every input line, and we selectively perform the print command by using an >if statement. Both methods are available, and you can choose the one that best >meshes with the other parts of your script. ></p> > ><pre caption="if if"> >{ > if ( $1 == "foo" ) { > if ( $2 == "foo" ) { > print "uno" > } else { > print "one" > } > } else if ($1 == "bar" ) { > print "two" > } else { > print "three" > } >} ></pre> > ><pre caption="if"> >! /matchme/ { print $1 $3 $4 } ></pre> > ><pre caption="if"> >{ > if ( $0 !~ /matchme/ ) { > print $1 $3 $4 > } >} ></pre> > ><p> >Both scripts will output only those lines that don't contain a matchme >character sequence. Again, you can choose the method that works best for your >code. They both do the same thing. ></p> > ><pre caption="Printing the fields equal to foo and bar"> >( $1 == "foo" ) && ( $2 == "bar" ) { print } ></pre> > ><p> >This example will print only those lines where field one equals foo and field >two equals bar. ></p> > ></body> ></section> ><section> ><title>Numeric variables!</title> ><body> > ><p> >In the BEGIN block, we initialize our integer variable x to zero. Then, each >time awk encounters a blank line, awk will execute the x=x+1 statement, >incrementing x. After all the lines have been processed, the END block will >execute, and awk will print out a final summary, specifying the number of blank >lines it found. ></p> > ></body> ></section> ><section> ><title>Stringy variables</title> ><body> > ><pre caption="Sample field"> >2.01 ></pre> > ><pre caption="1.01x$( )1.01"> >{ print ($1^2)+1 } ></pre> > ><p> >If you do a little experimenting, you'll find that if a particular variable >doesn't contain a valid number, awk will treat that variable as a numerical zero >when it evaluates your mathematical expression. ></p> > ></body> ></section> ><section> ><title>Lots of operators</title> ><body> > ><p> >Another nice thing about awk is its full complement of mathematical operators. >In addition to standard addition, subtraction, multiplication, and division, awk >allows us to use the previously demonstrated exponent operator "^", the modulo >(remainder) operator "%", and a bunch of other handy assignment operators >borrowed from C. ></p> > ><p> >These include pre- and post-increment/decrement ( i++, --foo ), add/sub/mult/div >assign operators ( a+=3, b*=2, c/=2.2, d-=6.2 ). But that's not all -- we also >get handy modulo/exponent assign ops as well ( a^=2, b%=4 ). ></p> > ></body> ></section> ><section> ><title>Field separators</title> ><body> > ><p> >Awk has its own complement of special variables. Some of them allow you to >fine-tune how awk functions, while others can be read to glean valuable >information about the input. We've already touched on one of these special >variables, FS. As mentioned earlier, this variable allows you to set the >character sequence that awk expects to find between fields. When we were using ><path>/etc/passwd</path> as input, FS was set to ":". While this did the trick, >FS allows us even more flexibility. ></p> > ><pre caption="Another field separator"> >FS="\t+" ></pre> > ><p> >Above, we use the special "+" regular expression character, which means "one or >more of the previous character". ></p> > ><pre caption="Setting FS to space"> >FS="[[:space:]+]" ></pre> > ><p> >While this assignment will do the trick, it's not necessary. Why? Because by >default, FS is set to a single space character, which awk interprets to mean >"one or more spaces or tabs." In this particular example, the default FS setting >was exactly what you wanted in the first place! ></p> > ><pre caption="Field separator sample"> >FS="foo[0-9][0-9][0-9]" ></pre> > ></body> ></section> ><section> ><title>Number of fields</title> ><body> > ><pre caption="Number of fields"> >{ > if ( NF > 2 ) { > print $1 " " $2 ":" $3 > } >} ></pre> > ></body> ></section> ><section> ><title>Record number</title> ><body> > ><pre caption="Record number"> >{ > <comment>#skip header</comment> > if ( NR > 10 ) { > print "ok, now for the real information!" > } >} ></pre> > ><p> >Awk provides additional variables that can be used for a variety of purposes. >We'll cover more of these variables in later articles. ></p> > ><p> >We've come to the end of our initial exploration of awk. As the series >continues, I'll demonstrate more advanced awk functionality, and we'll end the >series with a real-world awk application. In the meantime, if you're eager to >learn more, check out the resources listed below. ></p> > ></body> ></section> ></chapter> > ><chapter> ><title>Resources</title> ><section> ><title>Useful links</title> ><body> > ><ul> > <li> > Read Daniel's other awk articles on developerWorks: Common threads: Awk by > example, <uri link="l-awk2.xml">Part 2</uri> and <uri > link="l-awk3.xml">Part 3</uri>. > </li> > <li> > If you'd like a good old-fashioned book, O'Reilly's <uri > link="http://www.oreilly.com/catalog/sed2/">sed & awk, 2nd Edition</uri> > is a wonderful choice. > </li> > <li> > Be sure to check out the <uri > link="http://www.faqs.org/faqs/computer-lang/awk/faq/">comp.lang.awk > FAQ</uri>. It also contains lots of > additional awk links. > </li> > <li> > Patrick Hartigan's <uri link="http://sparky.rice.edu/~hartigan/awk.html">awk > tutorial</uri> is packed with handy awk scripts. > </li> > <li> > <uri link="http://www.tasoft.com/tawk.html">Thompson's TAWK Compiler</uri> > compiles awk scripts into fast binary executables. Versions are available > for Windows, OS/2, DOS, and UNIX. > </li> > <li> > <uri link="http://www.gnu.org/software/gawk/manual/gawk.html">The GNU Awk > User's Guide</uri> is available for online reference. > </li> ></ul> > ></body> ></section> ></chapter> > ></guide> >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 110994
:
71786
|
71787