find and xargs: bulk file operations without fear
Master the find command and xargs for safe, efficient bulk file operations: handle spaces in filenames, batch deletes, renames, and parallel processing.
The junior engineer ran find /var/log -name "*.log" | xargs rm. The server had log files with spaces in their names. The shell split those filenames on whitespace. Entire directories disappeared. The operation finished in under two seconds and the damage took three hours to untangle.
That story is not hypothetical. It is the canonical cautionary tale of naive find-to-xargs pipelines, and understanding exactly why it breaks — and how to prevent it — is the difference between a useful automation skill and a liability. This post walks through find predicates, the action syntax, and every safety consideration you need before running bulk operations in production.
See the CLI category for related tooling, the find reference, and the files and directories guide for broader context.
How find walks a tree
find starts at one or more path arguments and recurses through the directory tree, testing each entry against predicates. When a file matches all predicates, find applies the action (by default, print the path).
find /path/to/start [predicates] [actions]
The traversal is depth-first. Every entry — file, directory, symlink, device — is visited unless you explicitly prune subtrees.
Selecting by name
-name matches the filename component (not the full path) using shell glob patterns. It is case-sensitive.
find . -name "*.log"
find . -name "*.Log" # different result on case-sensitive filesystems
-iname is the case-insensitive variant:
find . -iname "*.jpg" # matches .jpg, .JPG, .Jpg
-path matches the full path string including directory components, and -ipath is its case-insensitive twin:
find . -path "*/cache/*.tmp"
Selecting by type
-type f matches regular files. -type d matches directories. -type l matches symbolic links. You almost always want -type f before destructive actions to avoid operating on directories:
find . -type f -name "*.bak"
Selecting by size
-size accepts units: c for bytes, k for kilobytes (1024 bytes), M for mebibytes, G for gibibytes. Prefix with + for “greater than” and - for “less than”:
find . -type f -size +100M # files larger than 100 MiB
find . -type f -size -1k # files smaller than 1 KiB
Selecting by modification time
-mtime N matches files modified exactly N days ago; -mtime +N means more than N days ago; -mtime -N means less than N days ago. The unit is 24-hour periods, rounded down.
-mmin is the same logic in minutes, which is useful for log rotation and monitoring scripts:
find /var/log -name "*.log" -mtime +30 # older than 30 days
find /tmp -type f -mmin +60 # modified more than 60 minutes ago
Combining predicates
Predicates are ANDed by default. -a is explicit AND, -o is OR, ! negates. Use escaped parentheses to control precedence:
find . \( -name "*.jpg" -o -name "*.png" \) -type f -size +1M
find . -type f ! -name "*.git"
Pruning subtrees
-prune prevents descent into a matched directory. This pattern skips .git directories entirely:
find . -name ".git" -prune -o -type f -name "*.py" -print
The -o (OR) is essential: once -prune matches and fires, the right-hand side of the OR handles files that did not match .git.
Acting on matches: -exec
The -exec action runs a command for each matched file. The literal string {} is replaced by the current filename. The action is terminated by \;:
find . -type f -name "*.tmp" -exec rm {} \;
This works, but it spawns one process per file. For ten thousand files, that is ten thousand rm invocations. The + terminator changes the behavior: find collects matched paths and passes them as a batch to a single command invocation, similar to how xargs works:
find . -type f -name "*.tmp" -exec rm {} +
-exec {} + is generally faster because it minimises process-creation overhead. The only constraint is that the command must accept multiple filename arguments at the end, which most Unix utilities do.
-delete is a built-in action that deletes matched files directly, even faster than -exec rm {} +, and it automatically avoids the race condition of removing a file by name after you found it:
find . -type f -name "*.tmp" -delete
The data-flow diagram
xargs: building command lines from stdin
xargs reads items from standard input and appends them as arguments to a command. By default it splits on whitespace and newlines, which is exactly where the danger lives.
Basic usage
find . -type f -name "*.log" | xargs wc -l
This counts lines across all matched files, passing as many filenames as possible in each wc invocation rather than one per file.
Controlling batch size: -n
-n N limits each invocation to at most N arguments:
find . -type f -name "*.jpg" | xargs -n 10 convert-batch
Parallel execution: -P
-P N runs up to N processes simultaneously. Combined with -n, this is a cheap parallel processing pattern:
find . -type f -name "*.png" -print0 | xargs -0 -P 4 -n 1 optipng
This runs four optipng processes at once, each receiving one file. On a four-core machine it can compress images four times faster.
Placeholder substitution: -I
-I STR replaces every occurrence of STR in the command template with the current argument. This lets you place the filename somewhere other than the end of the argument list:
find . -type f -name "*.csv" -print0 | xargs -0 -I {} mv {} /archive/{}
Note that -I processes one argument at a time, so it implies -n 1. It is useful for complex argument placement but gives up the batching benefit.
The critical safety issue: filenames with spaces
The solution is a two-part flag pair: find ... -print0 emits paths separated by the NUL character (ASCII 0) instead of newlines, and xargs -0 reads NUL-separated input instead of whitespace-separated. Since NUL cannot appear in a filename on any POSIX filesystem, this is safe by definition.
find . -type f -name "*.log" -print0 | xargs -0 rm
This is the canonical safe pipeline. Use it by default, not as an afterthought.
Cost comparison: -exec \; vs -exec + vs xargs
When should you prefer -exec {} + over xargs -0? When you want to stay inside the find expression without a separate pipeline, or when the command does not accept input from stdin at all. When should you prefer xargs -0? When you need -P for parallelism, when you want -n to control group size precisely, or when you are composing a longer pipeline that feeds into other tools. See the glossary for definitions of ARG_MAX and related limits.
A dry-run discipline
# Step 1: see what matches
find /var/log -type f -name "*.log" -mtime +30
# Step 2: if the list looks right, delete
find /var/log -type f -name "*.log" -mtime +30 -delete
An even safer pattern for complex -exec actions is to replace the command with echo first:
find . -type f -name "*.bak" -exec echo "would remove: {}" \;
This prints every path that would be passed to the real command. Only when the output is exactly what you expect do you swap in the real command.
Real recipes
Delete log files older than 30 days
find /var/log -type f -name "*.log" -mtime +30 -print0 | xargs -0 rm -f
Or equivalently with the built-in action:
find /var/log -type f -name "*.log" -mtime +30 -delete
Bulk-rename: replace spaces with underscores
-exec with a shell invocation handles the rename logic. The bash -c wrapper lets you use variable expansion:
find . -type f -name "* *" -print0 | xargs -0 -I {} bash -c '
new=$(echo "$1" | tr " " "_")
mv "$1" "$new"
' _ {}
The _ {} at the end passes {} (the filename) as $1 inside the shell; _ fills $0 (the script name), which is harmless.
Grep across matched files
find . -type f -name "*.py" -print0 | xargs -0 grep -l "TODO"
-l tells grep to print only the filename of matching files, not every matching line.
Count total lines across a set of files
find . -type f -name "*.go" -print0 | xargs -0 wc -l | tail -1
tail -1 extracts the total that wc -l appends when it receives multiple files.
Parallel image optimisation
find . -type f \( -name "*.jpg" -o -name "*.jpeg" \) -print0 \
| xargs -0 -P 8 -n 1 jpegoptim --max=85 --strip-all
Eight jpegoptim workers run concurrently. Adjust -P to match your CPU count.
-exec + vs xargs -0: when each wins
| Situation | Prefer |
|---|---|
| Simple bulk delete or copy, no parallelism needed | -exec {} + or -delete |
| Need parallel workers | xargs -0 -P N |
| Need precise batch-size control | xargs -0 -n N |
| Composing a longer pipeline | xargs -0 |
| Command does not accept multiple filenames | -exec {} \; (one at a time) |
| Avoiding an extra process entirely | -delete (for removes) |
The main limitation of -exec {} + is that it cannot parallelize — find itself is single-threaded in how it dispatches. xargs -P is the only built-in way to get concurrent file processing without writing a shell loop or reaching for a tool like parallel.
Frequently asked questions
Why does find | xargs rm sometimes delete the wrong files?
The default xargs tokenizes its input on any whitespace character, including spaces inside filenames. photo from 2024.jpg becomes three tokens — photo, from, 2024.jpg — and rm tries to delete three separate (non-existent or wrong) files. Use find ... -print0 | xargs -0 rm to delimit on NUL instead. Files cannot contain NUL, so no splitting occurs.
What is the difference between -mtime +7 and -mtime 7?
-mtime 7 matches files modified exactly 7 full 24-hour periods ago (meaning between 168 and 192 hours ago). -mtime +7 matches files modified more than 7 full periods ago (older than 168 hours). For log rotation you almost always want +N, not N.
Can I test my find command without running any action?
Yes. Omit all actions and find defaults to -print, listing every matching path to stdout. This is the safest dry-run approach. You can also prefix a destructive -exec with echo to see what arguments would be passed, without executing anything.
Is -exec {} + always safe with filenames containing spaces?
Yes. -exec {} + is handled entirely inside find — it never passes filenames through a shell or through whitespace-splitting. The {} placeholder is replaced by the actual filename bytes. So -exec rm {} + is space-safe even without -print0. The space hazard is specific to the | xargs pipeline when you omit -0.