grep, sed, awk: the text-processing trio worth mastering
grep sed awk command line text processing explained: when to use each tool, real recipes, regex fundamentals, and how pipelines compose all three.
Every time a server log needs scanning, a CSV column needs summing, or a config file needs a bulk rename, the same three tools show up. They are old — older than most of the developers using them — but they remain the lingua franca of text manipulation on any Unix-like system. grep finds lines. sed rewrites a stream. awk understands columns. Knowing exactly where one ends and the next begins is the difference between a five-second one-liner and a three-minute fumble.
This post is a practical field guide. We will cover each tool individually — its mental model, its most useful flags, and a handful of real recipes — then look at how they combine in pipelines and share a common regex foundation.
The decision in thirty seconds
Before the deep dive, here is the heuristic you can internalize right now:
- Do you need to find or count lines that match a pattern? Reach for
grep. - Do you need to substitute, delete, or rearrange text in a file or stream? Reach for
sed. - Do you need to work with columns, do arithmetic, or aggregate rows? Reach for
awk.
The diagram below maps this out visually.
grep: find what you are looking for
grep does one thing: it reads lines and prints the ones that match a pattern. Its name comes from the ed editor command g/re/p — globally match a regular expression and print. Everything it does flows from that single purpose.
The flags you will actually use
| Flag | Effect |
|---|---|
-i | Case-insensitive match |
-v | Invert: print lines that do NOT match |
-E | Extended regex (alternation, +, ? without escaping) |
-F | Fixed-string mode — no regex, just a literal substring |
-r | Recurse into directories |
-l | Print only filenames, not matching lines |
-n | Prefix each line with its line number |
-c | Print a count of matching lines per file |
-o | Print only the matched portion, not the whole line |
Real recipes
Count how many times ERROR appears in today’s log:
grep -c "ERROR" /var/log/app/2026-06-07.log
Find all IP addresses in a file (using extended regex):
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log
Search every .conf file under /etc for a pattern, case-insensitively:
grep -ril "maxconn" /etc
Print lines that do not start with # (skip comments):
grep -v "^#" config.ini
grep wins when the question is “which lines match?” It does not rewrite anything — the moment you need to change text, hand off to sed.
sed: edit the stream
sed is a stream editor. It reads input line by line, applies a script of editing commands, and writes the result. It never sees the whole file at once — it processes one line (record) at a time, which makes it memory-efficient and composable.
The workhorse command is substitution: s/pattern/replacement/flags. The g flag at the end means “replace all occurrences on the line, not just the first.”
Core commands
sed 's/foo/bar/' # replace first foo with bar on each line
sed 's/foo/bar/g' # replace all foo with bar on each line
sed '/^#/d' # delete lines starting with #
sed -n '5,10p' # print only lines 5 through 10 (-n suppresses default output)
sed 's/ */ /g' # collapse multiple spaces into one
Find-and-replace across files with -i
sed’s -i flag edits files in place — no redirect needed. This is where a portability trap lives.
To rename a function across every Python file in a directory on Linux:
sed -i 's/get_user_data/fetch_user_record/g' src/*.py
On macOS, the equivalent is:
sed -i '' 's/get_user_data/fetch_user_record/g' src/*.py
When sed is the right choice
sed wins when you need to transform text structurally — swap a value, strip a prefix, remove blank lines, print a range of lines. It is not the right tool when you need to work with individual columns or do arithmetic. That is awk’s territory.
awk: the column language
awk is not just a filter or a substitutor. It is a small, complete programming language oriented around records and fields. Each line is a record. Each whitespace-separated token on that line is a field, referenced as $1, $2, …, $NF (last field). NR is the current record (line) number. NF is the number of fields on the current line.
The diagram below shows this model concretely.
Program structure
An awk program consists of pattern { action } blocks. If you omit the pattern, the action runs on every line. BEGIN and END are special patterns that run once before and after all input respectively.
awk 'BEGIN { print "start" } { print $1 } END { print "done" }' file.txt
Real recipes
Extract just the second column of a space-separated file:
awk '{ print $2 }' employees.txt
Sum the third column (salaries):
awk '{ sum += $3 } END { print sum }' employees.txt
Compute an average:
awk '{ sum += $3; count++ } END { print sum / count }' employees.txt
Use a comma as the field separator (-F sets FS):
awk -F',' '{ print $1, $3 }' data.csv
Group-by and count — count employees per department:
awk '{ dept[$2]++ } END { for (d in dept) print d, dept[d] }' employees.txt
Print lines where salary exceeds 80000:
awk '$3 > 80000 { print $1, $3 }' employees.txt
awk wins whenever you need to do anything that involves treating the line as structured columns — extract, filter by value, count, sum, average, or group. It handles tasks that would require a spreadsheet import in other contexts.
The shared regex foundation
All three tools understand regular expressions. There are two dialects to know:
- Basic regex (BRE):
grepandseddefault to this. Metacharacters like+,?,|, and grouping parentheses\(...\)require backslash-escaping. - Extended regex (ERE): enabled with
grep -Eoregrep, and withsed -E. The same metacharacters work without backslashes.
The anchors and character classes work identically across all three:
^ # start of line
$ # end of line
. # any single character
[abc] # character class: a, b, or c
[^abc] # negated class: not a, b, or c
[a-z] # range
\d # digit (GNU extensions; use [0-9] for portability)
* # zero or more of the preceding
+ # one or more (ERE)
? # zero or one (ERE)
{3,5} # between 3 and 5 repetitions (ERE)
When you learn a regex pattern in one tool, it transfers directly to the others. This shared vocabulary is what makes them so composable.
The same task, three ways
Consider a server access log with lines like 2026-06-07 14:03:22 POST /api/users 201 34ms. You want to find slow requests (over 100ms), clean up the unit suffix, and count them by endpoint.
With grep alone — find lines over 100ms (rough filter):
grep -E '[1-9][0-9]{2,}ms' access.log
With sed — strip the ms suffix from the timing field across the whole file:
sed 's/ms$//' access.log
With awk — extract endpoint and timing, filter slow ones, count per endpoint:
awk '{ gsub(/ms$/, "", $6); if ($6+0 > 100) slow[$5]++ } END { for (e in slow) print slow[e], e }' access.log
The awk solution is the richest because the task is inherently columnar and requires aggregation. grep is right for the quick scan. sed is right when you just want to clean up formatting.
Composing pipelines
The tools shine brightest when piped together. A pipeline lets each tool do what it is best at, without any single tool overreaching.
Filter for error lines, then count errors by service name (field 3):
grep "ERROR" app.log | awk '{ count[$3]++ } END { for (s in count) print count[s], s }' | sort -rn
Find all .env files under the project, exclude comments, and list unique key names:
grep -rh --include="*.env" -v "^#" . | sed 's/=.*//' | sort -u
Pull failed login attempts, extract usernames, and count top offenders:
grep "Failed password" /var/log/auth.log | awk '{ print $9 }' | sort | uniq -c | sort -rn | head
Each stage has a clear responsibility. grep narrows the line set. sed reshapes the text. awk aggregates the records. sort and uniq round out the pipeline. None of the tools need to know what the others are doing — they just pass lines forward.
For more on finding files before processing them, see the find command guide. The grep reference page at /cli/grep/ covers additional flags and PCRE options. The broader CLI category has guides on building full pipelines.
If you encounter unfamiliar terminology while reading man pages, the glossary covers regex vocabulary and POSIX terms. Common interview questions about these tools are collected at /interview/.
Frequently asked questions
When should I use grep -F instead of a regular expression?
Use -F (fixed-string mode) whenever your search pattern contains characters that are regex metacharacters but you mean them literally — dots, brackets, parentheses, asterisks. Searching for 192.168.1.1 with a plain grep will also match 192X168Y1Z1 because . is a wildcard. With -F, the dot is just a dot. It is also slightly faster because no regex engine is involved.
The sed -i command works on my Linux server but breaks on my Mac. What is happening?
This is the GNU vs BSD sed portability issue. GNU sed (Linux) makes the backup-suffix argument optional, so sed -i 's/x/y/' file works. BSD sed (macOS, FreeBSD) requires the suffix even if it is empty: sed -i '' 's/x/y/' file. If your scripts run on both platforms, the safest cross-platform substitute is perl -pi -e 's/x/y/g' file, which has consistent behavior everywhere.
awk looks like a programming language. When is it too much and I should use Python instead?
Reach for Python (or another scripting language) when the logic spans more than one or two awk programs chained together, when you need to join data from multiple files with complex key logic, when you need to parse structured formats like JSON or XML, or when the script will need to be maintained by people who are not comfortable with awk syntax. A single awk program of 3-5 lines is almost always faster to write and run than an equivalent Python script. Beyond that, the maintainability calculus shifts.
Can I use awk as a drop-in replacement for both grep and sed?
Technically yes — awk '/pattern/ { print }' file mimics grep, and awk '{ gsub(/old/, "new"); print }' file mimics sed. In practice, using awk for pure filtering or pure substitution is noisier syntax with no benefit. grep is terser for search, sed is terser for stream edits, and each starts faster than awk on small inputs. Use the right tool for the job and reserve awk for the columnar and aggregation work where nothing else competes.