When we run certain commands in Linux to read or edit text from a string or file, we often try to filter the output to a specific section of interest. This is where using regular expressions comes in handy.
Please refer to our previous tutorials in the Awk series:
A regular expression can be defined as strings that represent several sequences of characters. One of the most important things about regular expressions is that they allow you to filter the output of a command or file, edit a section of a text or configuration file, and so on.
Regular expressions are made of:
In order to filter text, one has to use a text filtering tool such as awk. You can think of awk as a programming language of its own. But for the scope of this guide to using awk, we shall cover it as a simple command line filtering tool.
The general syntax of awk is:
awk 'script' filename
Where 'script' is a set of commands that are understood by awk and are executed on file, filename.
It works by reading a given line in the file, making a copy of the line, and then executing the script on the line. This is repeated on all the lines in the file.
The 'script' is in the form '/pattern/ action' where the pattern is a regular expression and the action is what awk will do when it finds the given pattern in a line.
In the following examples, we shall focus on the meta characters that we discussed above under the features of awk.
The example below prints all the lines in the file /etc/hosts since no pattern is given.
awk '//'/etc/hosts
In the example below, a pattern localhost has been given, so awk will match the line having localhost in the /etc/hosts file.
awk '/localhost/' /etc/hosts
The (.) will match strings containing loc, localhost, localnet in the example below.
That is to say * l some_single_character c *.
awk '/l.c/' /etc/hosts
It will match strings containing localhost, localnet, lines, capable, as in the example below:
awk '/l*c/' /etc/localhost
You will also realize that (*) tries to get you the longest match possible it can detect.
Let’s look at a case that demonstrates this, take the regular expression t*t which means matching strings that start with the letter t and end with t in the line below:
this is tecmint, where you get the best good tutorials, how to's, guides, tecmint.
You will get the following possibilities when you use the pattern /t*t/ :
this is t this is tecmint this is tecmint, where you get t this is tecmint, where you get the best good t this is tecmint, where you get the best good tutorials, how t this is tecmint, where you get the best good tutorials, how tos, guides, t this is tecmint, where you get the best good tutorials, how tos, guides, tecmint
And (*) in /t*t/ wild card character allows awk to choose the last option:
this is tecmint, where you get the best good tutorials, how to's, guides, tecmint
Take for example the set [al1] , here awk will match all strings containing character a or l or 1 in a line in the file /etc/hosts.
awk '/[al1]/' /etc/hosts
The next example matches strings starting with either K or k followed by T :
# awk '/[Kk]T/' /etc/hosts
Understand characters with awk:
Let’s look at an example below:
awk '/[0-9]/' /etc/hosts
All the line from the file /etc/hosts contain at least a single number [0-9] in the above example.
It matches all the lines that start with the pattern provided as in the example below:
# awk '/^fe/' /etc/hosts # awk '/^ff/' /etc/hosts
It matches all the lines that end with the pattern provided:
awk '/ab$/' /etc/hosts awk '/ost$/' /etc/hosts awk '/rs$/' /etc/hosts
It allows you to take the character following it as a literal that is to say consider it just as it is.
In the example below, the first command prints out all lines in the file, and the second command prints out nothing because I want to match a line that has $25.00, but no escape character is used.
The third command is correct since an escape character has been used to read $ as it is.
awk '//' deals.txt awk '/\.00/' deals.txt awk '/\\.00/' deals.txt
That is not all with the awk command line filtering tool, the examples above a the basic operations of awk. In the next parts, we shall be advancing on how to use complex features of awk.
For those seeking a comprehensive resource, we’ve compiled all the Awk series articles into a book, that includes 13 chapters and spans 41 pages, covering both basic and advanced Awk usage with practical examples.
Product Name | Price | Buy |
---|---|---|
eBook: Introducing the Awk Getting Started Guide for Beginners | $8.99 | [Buy Now] |
Thanks for reading through and for any additions or clarifications, post a comment in the comments section.
Hey TecMint readers,
Exciting news! Every month, our top blog commenters will have the chance to win fantastic rewards, like free Linux eBooks such as RHCE, RHCSA, LFCS, Learn Linux, and Awk, each worth $20!
Aaron KiliAaron Kili is a Linux and F.O.S.S enthusiast, an upcoming Linux SysAdmin, web developer, and currently a content creator for TecMint who loves working with computers and strongly believes in sharing knowledge.
Each tutorial at TecMint is created by a team of experienced Linux system administrators so that it meets our high-quality standards.
Join the TecMint Weekly Newsletter (More Than 156,129 Linux Enthusiasts Have Subscribed) Was this article helpful? Please add a comment or buy me a coffee to show your appreciation.I’ve come across many awk tutorials, but this one stands out as the best by far. It’s helped me grasp awk in a way I never thought possible. The breakdown of the awk syntax, particularly the explanation of “awk pattern action file,” is incredibly clear and concise. I’ve yet to find such clarity elsewhere. Thank you immensely for this invaluable resource. Reply
RF EngineerThis is wonderful tutorial and very well illustrated. There is a minor error in the explanations on what asterisk (*) means in regular expressions — it means ‘match the previous character zero or more times‘. For example 'p*' will match the letter ‘p’ zero or more times, thus this expression will match anything and everything because it will be looking for the letter ‘p’ to be contained zero or more times — and absolutely any text contains the letter ‘p’ either zero or more times. For this reason asterisk is never used with just a single symbol before it, it must be used in an expression with more symbols inside it, like /A-*B/ , which will match “A” followed by zero or more hyphens "-" and then followed by “B”, thus the following strings will produce a match “AB” (this has zero occurrences of ‘-‘), “A-B”, “A–B”, “A—B” and so on. Note that the text presented in this tutorial erroneously suggests that /A-*B/ will not match “AB” when a simple check in AWK shows that it is matching it (in fact a test in any REGEXP application will show the same result, e.g. egrep). For this reason this tutorial interprets somewhat erroneously the result of:
# awk '/l*c/' /etc/localhost
The above will match all lines that contain the letter ‘c’ regardless whether they contain the letter ‘l’ or not. This is because /l*/ means ‘match letter l zero or more times‘, so /l*c/ means ‘match letter c preceded by zero or more occurrences of letter l‘, but any line that contains the letter c in it also contain zero or more letters l in front of it — the key to understanding this is “ZERO or more times“. CONCLUSION: In REGEXP the asterisk symbol (*) does not mean the same thing as in Microsoft Windows and DOS/CMD file-name matching, it does not match any character (as this tutorial erroneously suggests), it matches the preceding character ZERO or more times. Reply
Daniel P FruzzettiI’m struggling to make my script work. I have a huge file and each line needs to be searched for the string “WAP” and then, when found, the character appearing two characters BEFORE the string needs to be returned. Can you help me simplify this? Reply
ROBSON MASSAKI KOBAYASHI“Using Awk with (*) Character in a Pattern It will match strings containing localhost, localnet, lines, capable, as in the example below:” I think capable not match, but the whole line. Reply
Aaron Kili @ROBSON Yes, this is true, it matches the whole line. Reply Erick Manuel BazánAmazing tutorial , thanks a lot, I have a question. I have a file (testFile) with the following content:
one:two:three three:two:one
I’m running this command as a test where after looking for the pattern I want a message telling me wheter or not it found matches . awk ‘/^z/ else >’ testFile Since there are no lines matching the pattern, I’m expecting the “No matches found” message, but it shows “There are matches“. Reply
HI, Can you please provide the script details further, if possible Thanks & Regards,I like this tutorial, but the animated gifs are so annoying that I gave up after the l.c and will look elsewhere. Reply
Aaron Kili@John What is wrong with the gifs? Tell us so that we can correct them in future articles, to make them easy for our followers and readers like you to understand. Reply
The problem with your gif is when we are in the mid of reading gif. It again starts to load. Either slow it down or just load it once only. Its really annoying. Reply