datarekha
CLI June 7, 2026

grep, sed, awk: the text-processing trio worth mastering

grep sed awk command line text processing explained: when to use each tool, real recipes, regex fundamentals, and how pipelines compose all three.

13 min read · by datarekha · command linegrepsedawktext processing

Every time a server log needs scanning, a CSV column needs summing, or a config file needs a bulk rename, the same three tools show up. They are old — older than most of the developers using them — but they remain the lingua franca of text manipulation on any Unix-like system. grep finds lines. sed rewrites a stream. awk understands columns. Knowing exactly where one ends and the next begins is the difference between a five-second one-liner and a three-minute fumble.

This post is a practical field guide. We will cover each tool individually — its mental model, its most useful flags, and a handful of real recipes — then look at how they combine in pipelines and share a common regex foundation.

The decision in thirty seconds

Before the deep dive, here is the heuristic you can internalize right now:

  • Do you need to find or count lines that match a pattern? Reach for grep.
  • Do you need to substitute, delete, or rearrange text in a file or stream? Reach for sed.
  • Do you need to work with columns, do arithmetic, or aggregate rows? Reach for awk.

The diagram below maps this out visually.

What do you need to do?with plain-text inputfind / filtersubstitute / editcolumns / aggregategrepmatch, count, list,invert, recurseflags: -i -v -r -E -c -n -osedsubstitute, delete,print ranges, in-placecore: s/old/new/g, d, p, -nawkfields, NR/NF, patterns,BEGIN/END, arithmeticvars: $1 $2 $NF NR NF FS
Figure 1 — Which tool? A quick decision map for the three workhorses of command-line text processing.

grep: find what you are looking for

grep does one thing: it reads lines and prints the ones that match a pattern. Its name comes from the ed editor command g/re/p — globally match a regular expression and print. Everything it does flows from that single purpose.

The flags you will actually use

FlagEffect
-iCase-insensitive match
-vInvert: print lines that do NOT match
-EExtended regex (alternation, +, ? without escaping)
-FFixed-string mode — no regex, just a literal substring
-rRecurse into directories
-lPrint only filenames, not matching lines
-nPrefix each line with its line number
-cPrint a count of matching lines per file
-oPrint only the matched portion, not the whole line

Real recipes

Count how many times ERROR appears in today’s log:

grep -c "ERROR" /var/log/app/2026-06-07.log

Find all IP addresses in a file (using extended regex):

grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log

Search every .conf file under /etc for a pattern, case-insensitively:

grep -ril "maxconn" /etc

Print lines that do not start with # (skip comments):

grep -v "^#" config.ini

grep wins when the question is “which lines match?” It does not rewrite anything — the moment you need to change text, hand off to sed.

sed: edit the stream

sed is a stream editor. It reads input line by line, applies a script of editing commands, and writes the result. It never sees the whole file at once — it processes one line (record) at a time, which makes it memory-efficient and composable.

The workhorse command is substitution: s/pattern/replacement/flags. The g flag at the end means “replace all occurrences on the line, not just the first.”

Core commands

sed 's/foo/bar/'        # replace first foo with bar on each line
sed 's/foo/bar/g'       # replace all foo with bar on each line
sed '/^#/d'             # delete lines starting with #
sed -n '5,10p'          # print only lines 5 through 10 (-n suppresses default output)
sed 's/  */ /g'         # collapse multiple spaces into one

Find-and-replace across files with -i

sed’s -i flag edits files in place — no redirect needed. This is where a portability trap lives.

To rename a function across every Python file in a directory on Linux:

sed -i 's/get_user_data/fetch_user_record/g' src/*.py

On macOS, the equivalent is:

sed -i '' 's/get_user_data/fetch_user_record/g' src/*.py

When sed is the right choice

sed wins when you need to transform text structurally — swap a value, strip a prefix, remove blank lines, print a range of lines. It is not the right tool when you need to work with individual columns or do arithmetic. That is awk’s territory.

awk: the column language

awk is not just a filter or a substitutor. It is a small, complete programming language oriented around records and fields. Each line is a record. Each whitespace-separated token on that line is a field, referenced as $1, $2, …, $NF (last field). NR is the current record (line) number. NF is the number of fields on the current line.

The diagram below shows this model concretely.

alice engineering 92000 seniorNR=1NF=4$1$2$3$NF ($4)aliceengineering92000senior$0 = entire line; FS = field separator (default: whitespace); NR = record number; NF = field count
Figure 2 — awk’s record model. Every line is a record; every token is a numbered field. $NF always refers to the last field regardless of how many there are.

Program structure

An awk program consists of pattern { action } blocks. If you omit the pattern, the action runs on every line. BEGIN and END are special patterns that run once before and after all input respectively.

awk 'BEGIN { print "start" } { print $1 } END { print "done" }' file.txt

Real recipes

Extract just the second column of a space-separated file:

awk '{ print $2 }' employees.txt

Sum the third column (salaries):

awk '{ sum += $3 } END { print sum }' employees.txt

Compute an average:

awk '{ sum += $3; count++ } END { print sum / count }' employees.txt

Use a comma as the field separator (-F sets FS):

awk -F',' '{ print $1, $3 }' data.csv

Group-by and count — count employees per department:

awk '{ dept[$2]++ } END { for (d in dept) print d, dept[d] }' employees.txt

Print lines where salary exceeds 80000:

awk '$3 > 80000 { print $1, $3 }' employees.txt

awk wins whenever you need to do anything that involves treating the line as structured columns — extract, filter by value, count, sum, average, or group. It handles tasks that would require a spreadsheet import in other contexts.

The shared regex foundation

All three tools understand regular expressions. There are two dialects to know:

  • Basic regex (BRE): grep and sed default to this. Metacharacters like +, ?, |, and grouping parentheses \(...\) require backslash-escaping.
  • Extended regex (ERE): enabled with grep -E or egrep, and with sed -E. The same metacharacters work without backslashes.

The anchors and character classes work identically across all three:

^        # start of line
$        # end of line
.        # any single character
[abc]    # character class: a, b, or c
[^abc]   # negated class: not a, b, or c
[a-z]    # range
\d       # digit (GNU extensions; use [0-9] for portability)
*        # zero or more of the preceding
+        # one or more (ERE)
?        # zero or one (ERE)
{3,5}    # between 3 and 5 repetitions (ERE)

When you learn a regex pattern in one tool, it transfers directly to the others. This shared vocabulary is what makes them so composable.

The same task, three ways

Consider a server access log with lines like 2026-06-07 14:03:22 POST /api/users 201 34ms. You want to find slow requests (over 100ms), clean up the unit suffix, and count them by endpoint.

With grep alone — find lines over 100ms (rough filter):

grep -E '[1-9][0-9]{2,}ms' access.log

With sed — strip the ms suffix from the timing field across the whole file:

sed 's/ms$//' access.log

With awk — extract endpoint and timing, filter slow ones, count per endpoint:

awk '{ gsub(/ms$/, "", $6); if ($6+0 > 100) slow[$5]++ } END { for (e in slow) print slow[e], e }' access.log

The awk solution is the richest because the task is inherently columnar and requires aggregation. grep is right for the quick scan. sed is right when you just want to clean up formatting.

Composing pipelines

The tools shine brightest when piped together. A pipeline lets each tool do what it is best at, without any single tool overreaching.

Filter for error lines, then count errors by service name (field 3):

grep "ERROR" app.log | awk '{ count[$3]++ } END { for (s in count) print count[s], s }' | sort -rn

Find all .env files under the project, exclude comments, and list unique key names:

grep -rh --include="*.env" -v "^#" . | sed 's/=.*//' | sort -u

Pull failed login attempts, extract usernames, and count top offenders:

grep "Failed password" /var/log/auth.log | awk '{ print $9 }' | sort | uniq -c | sort -rn | head

Each stage has a clear responsibility. grep narrows the line set. sed reshapes the text. awk aggregates the records. sort and uniq round out the pipeline. None of the tools need to know what the others are doing — they just pass lines forward.

For more on finding files before processing them, see the find command guide. The grep reference page at /cli/grep/ covers additional flags and PCRE options. The broader CLI category has guides on building full pipelines.

If you encounter unfamiliar terminology while reading man pages, the glossary covers regex vocabulary and POSIX terms. Common interview questions about these tools are collected at /interview/.

Frequently asked questions

When should I use grep -F instead of a regular expression?

Use -F (fixed-string mode) whenever your search pattern contains characters that are regex metacharacters but you mean them literally — dots, brackets, parentheses, asterisks. Searching for 192.168.1.1 with a plain grep will also match 192X168Y1Z1 because . is a wildcard. With -F, the dot is just a dot. It is also slightly faster because no regex engine is involved.

The sed -i command works on my Linux server but breaks on my Mac. What is happening?

This is the GNU vs BSD sed portability issue. GNU sed (Linux) makes the backup-suffix argument optional, so sed -i 's/x/y/' file works. BSD sed (macOS, FreeBSD) requires the suffix even if it is empty: sed -i '' 's/x/y/' file. If your scripts run on both platforms, the safest cross-platform substitute is perl -pi -e 's/x/y/g' file, which has consistent behavior everywhere.

awk looks like a programming language. When is it too much and I should use Python instead?

Reach for Python (or another scripting language) when the logic spans more than one or two awk programs chained together, when you need to join data from multiple files with complex key logic, when you need to parse structured formats like JSON or XML, or when the script will need to be maintained by people who are not comfortable with awk syntax. A single awk program of 3-5 lines is almost always faster to write and run than an equivalent Python script. Beyond that, the maintainability calculus shifts.

Can I use awk as a drop-in replacement for both grep and sed?

Technically yes — awk '/pattern/ { print }' file mimics grep, and awk '{ gsub(/old/, "new"); print }' file mimics sed. In practice, using awk for pure filtering or pure substitution is noisier syntax with no benefit. grep is terser for search, sed is terser for stream edits, and each starts faster than awk on small inputs. Use the right tool for the job and reserve awk for the columnar and aggregation work where nothing else competes.

Skip to content