Pipes and redirection: how Unix composes small sharp tools

Picture this: your web server threw a wave of 500 errors overnight. You have a 2 GB access.log and you need to know which URL paths are the worst offenders — right now, without spinning up a database or writing a Python script. One line in your terminal:

grep ' 500 ' access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20

Twenty seconds later you have a ranked list of broken endpoints. No temp files. No boilerplate. Six tiny programs, each doing one job, snapped together like pipe fittings. That is the Unix philosophy made tangible, and understanding it at the stream level changes how you think about the command line entirely.

The Unix philosophy: small tools, text streams

Ken Thompson and Dennis Ritchie designed Unix around a deceptively simple idea: build many small programs that each do one thing well, and make them composable. The composition medium is plain text. If every program reads lines from its input and writes lines to its output, any program can talk to any other program — the authors never need to coordinate.

That is not a historical curiosity. It is the reason sort, written in 1971, can still sort the output of a program written last week. The interface is just bytes on a stream.

The three standard streams

Every Unix process starts with three open file descriptors — channels the kernel connects to the outside world before your program even runs.

stdin (fd 0) — the default input channel. By default it reads from your keyboard, but it can be redirected to read from a file or the output of another command.
stdout (fd 1) — where a program writes its normal output. By default it goes to your terminal.
stderr (fd 2) — where a program writes error messages and diagnostics. Also defaults to your terminal, but it is a separate channel so that errors do not contaminate the data stream.

That separation matters. When you pipe grep into awk, you want data — not error messages mixed in and breaking the field-split. stderr being independent is what makes pipelines reliable.

The three standard streams. stdin enters from the left; stdout and stderr leave separately, both to the terminal by default.

The pipe operator `|`

The pipe character connects two processes. The shell arranges for the stdout of the left command to feed directly into the stdin of the right command — through a kernel buffer, not a temp file on disk.

grep ' 500 ' access.log | awk '{print $7}'

Both commands run concurrently. The kernel manages a small buffer between them; grep writes into it and awk reads from it simultaneously. This is why pipelines on large files feel fast: you do not wait for step one to finish before step two begins.

Building the log-analysis pipeline step by step

Let’s assemble the opening one-liner piece by piece so the data flow is clear.

grep ' 500 ' access.log

Filter to lines that contain the literal string 500 (the status code in a common log format). Every matching line flows forward.

grep ' 500 ' access.log | awk '{print $7}'

awk splits each line on whitespace and prints the seventh field — in common log format that is the request path. Now we have a stream of paths, one per line.

grep ' 500 ' access.log | awk '{print $7}' | sort

sort reads all the paths and emits them in alphabetical order. Adjacent duplicates are now neighbours.

grep ' 500 ' access.log | awk '{print $7}' | sort | uniq -c

uniq -c collapses consecutive identical lines and prefixes each with a count. We now have count path pairs.

grep ' 500 ' access.log | awk '{print $7}' | sort | uniq -c | sort -rn

sort -rn sorts numerically in reverse, so the most-frequent path rises to the top.

grep ' 500 ' access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20

head -20 truncates to the top twenty. Done. Each command in this chain knows nothing about the others — they communicate only through the text stream between them.

Data flows left→right through six concurrent processes. Each command transforms the stream; none knows the others exist.

Redirection: steering streams to files

Pipes connect processes together. Redirection connects a process to a file (or pseudo-file). The operators are:

Operator	Effect
`cmd > file`	Overwrite `file` with stdout
`cmd >> file`	Append stdout to `file`
`cmd < file`	Read stdin from `file`
`cmd 2> file`	Redirect stderr to `file`
`cmd 2>&1`	Merge stderr into stdout
`cmd &> file`	Redirect both stdout and stderr to `file` (bash)

Overwrite vs. append

echo "first line" > notes.txt
echo "second line" >> notes.txt

> truncates and rewrites. >> is safe for accumulation — useful for log files or results you gather over time.

Reading stdin from a file

sort < words.txt

sort reads from words.txt instead of the keyboard. Functionally equivalent to sort words.txt for most commands, but the redirection form works even for programs that only read stdin and accept no filename arguments.

Dealing with stderr

By default stderr appears on your terminal even when you redirect stdout. That is intentional — you always see errors. But sometimes you want to capture them:

make 2> build-errors.txt

Or discard them entirely — /dev/null is a pseudo-file that swallows everything written to it:

ffmpeg -i input.mp4 output.webm 2> /dev/null

Merging stderr into stdout

The 2>&1 notation means “point file descriptor 2 at wherever file descriptor 1 currently points.” The order matters, and it trips up almost everyone at least once.

make > build.log 2>&1

This captures both stdout and stderr into build.log because the shell processes left-to-right: first > build.log points stdout at the file, then 2>&1 points stderr at the same file.

In bash you can abbreviate this as:

make &> build.log

&> is a bash-specific shorthand that redirects both streams atomically. It is not POSIX sh syntax, so avoid it in scripts that must run in dash or other minimal shells.

`/dev/null` — the bit bucket

/dev/null deserves its own mention. Writing to it discards the data; reading from it immediately returns end-of-file. It is the canonical way to silence output you do not care about:

cron-job.sh > /dev/null 2>&1

This discards both stdout and stderr — useful in cron entries where you only want to know about failures via a monitoring tool, not email noise. See the files and directories guide for more on working with pseudo-files in the filesystem.

`tee` — split the stream

Sometimes you want to see output on your terminal and capture it to a file simultaneously. tee reads stdin and writes it to both stdout and one or more files:

make 2>&1 | tee build.log

The build output scrolls past in real time and is also saved. You can chain tee mid-pipeline without breaking the flow:

grep ' 500 ' access.log | tee raw-500s.txt | awk '{print $7}' | sort | uniq -c | sort -rn

raw-500s.txt receives the unprocessed 500-error lines; the rest of the pipeline continues with the same data. One read of the file, two uses.

Why streams beat passing files around

The alternative to piping is staging results in temporary files:

grep ' 500 ' access.log > /tmp/step1.txt
awk '{print $7}' /tmp/step1.txt > /tmp/step2.txt
sort /tmp/step2.txt > /tmp/step3.txt

This is slower (each step must finish and flush before the next reads), wastes disk space on potentially large intermediate files, and leaves cleanup as your problem. Streams are zero-copy from a developer perspective: data flows through kernel buffers in memory, steps run concurrently, and nothing persists unless you explicitly ask for it with tee or a final redirect.

There is also a composability advantage. Because the tools only know about streams, you can swap any one component. Replace awk with cut, replace sort | uniq -c with a Python one-liner, replace head with tail — the rest adapts automatically. See the grep guide for a deeper look at pattern-based filtering, which pairs with pipes constantly.

Exit codes and pipeline failures

Every Unix command exits with a status code. Zero means success; anything else means failure. This matters for scripting, and pipes add a nuance.

By default in bash, a pipeline’s exit code is the exit code of the last command. If sort fails but head succeeds, echo $? reports zero — the failure is silent.

set -o pipefail

With pipefail set, a pipeline fails if any component fails. Recommended in any script where reliability matters.

To inspect exit codes from each stage individually, bash provides PIPESTATUS:

grep ' 500 ' access.log | sort | head -5
echo "${PIPESTATUS[@]}"

The array contains one exit code per pipeline stage, left to right. Useful for debugging a pipeline where one middle stage is misbehaving.

Check the glossary entry for “exit code” if you need a refresher on how status codes integrate with &&, ||, and if statements in shell scripts.

The mental model

Think of a Unix pipeline as a series of transformations in a data-flow graph. Data enters on the left as raw text, passes through a sequence of functions that each reshape it, and exits on the right as the answer you need. Redirection operators are valves: they divert a stream from its default destination (terminal or keyboard) to a file, or merge two streams into one.

The CLI section of this site walks through each family of tools — searching with grep, navigating with find, transforming with awk and sed — all of which become more powerful once you see them as pipeline components rather than standalone commands.

Understanding streams is the threshold concept. Once it clicks, you stop copying data into files to pass it between tools, and you start building small, correct, composable pipelines instead.

Frequently asked questions

Does a pipe use disk space? No. A pipe is an in-kernel buffer (typically 64 KB on Linux). Data moves through memory between the two processes. Nothing is written to disk unless you explicitly redirect to a file or use tee.

Why does cmd 2>&1 > file not work the way I expect? Because the shell evaluates redirections left-to-right. When 2>&1 runs first, stdout still points at the terminal, so stderr is sent there. Then > file moves stdout to the file — but stderr’s destination was already fixed. The correct form is cmd > file 2>&1: move stdout first, then point stderr at stdout’s new destination.

Can I pipe stderr directly, without merging it into stdout? In bash 4+ you can use process substitution to route stderr separately, but there is no clean | syntax for stderr alone in POSIX sh. The common workaround is cmd 2>&1 >/dev/null | next-command, which swaps the streams: stdout is discarded and stderr flows into the pipe.

What is the difference between &> and > file 2>&1? They produce the same result in bash: both redirect stdout and stderr to the same file. &> is a bash extension and is shorter to type. The long form > file 2>&1 is POSIX-compatible and works in any POSIX shell, so prefer it in portable scripts.