Customizing Find and Replace in the Command Line

Introduction

In this post, we will build commands for performing find and replace operations via the command line. We will first build it for a basic case, explore the customizations available to us, and build on top of that for more complex cases, such as performing the operation on a directory, previewing changes before committing them, adding a confirmation prompt, etc.

The goal of this article is not to just teach you how to find and replace in the command line. It is also aimed at teaching general command line skills and concepts, exploring the powers of the command line and the wealth of customizations it grants us.

NOTE: I recommend you follow along. Open your terminal, create a test folder / directory and create random test files to apply the learnings you read here. Learning is best done by doing!

Why Not Use Text Editor / IDE?

Just about all text editors and IDEs have a find and replace feature. Those are great to use if you are already in the middle of editing a file, but in my opinion, using the command line is better and faster (once you get used to it) for cases such as:

You are not already in a text editor, and you only want to perform find and replace.
You want to apply the find and replace operation to multiple files. This is especially useful in a large codebase.

You may disagree and find that the IDE is preferable. There's only one way to find out, so buckle up and get ready to see it in action!

Using SED to Find and Replace

sed is the ultimate utility for find and replace (though not the only one). sed, standing for Stream EDitor, takes a text input (which could also be a file) and performs filtering and transformations on it as specified by the user. This output can be optionally directed to a file, including the same file it was read from, or piped into another program (or standard output).

sed has many options for text manipulation commands, but the most common is the s command, standing for substitute. It is so popular that it is the only sed command many people know, and some even think it is sed's only functionality. sed has many other commands, and you can find them by reading the info page (run info sed in the command line, or visit the gnu website).

Basic Usage

To use sed for performing find and replace on an input, you can run the following:

sed [options] 's/[regex-pattern]/[text-to-replace-with]/[flags]' [input-file]

the part in single quotes is called the "command" or "script" for sed. It starts with s, denoting substitute. The regex-pattern is a regex pattern for the text to replace. text-to-replace-with is, self-descriptively, what sed will replace the text matched by the pattern with.

Basic example:

sed 's/patternToReplace/replaceWith/' somefile.txt

OR:

echo 'random text with patternToReplace here' | sed 's/patternToReplace/replaceWith/'

SED's Flags

Flags modify sed's behavior, such as denoting how many of the matched patterns to replace, whether or not to be case insensitive, printing the line with a match, and some more advanced features, such as executing the result of the substitution as a shell command.

Replace first N Occurrences

To replace only the first N occurrences, we can use a number as a flag, denoting how many times to perform the find and replace operation.

sed 's/replaceThis/replaceWith/2' somefile.txt

The above command will take only the first two occurrences of "replaceThis" and replace it with the value "replaceWith". Without the flag, it would only replace the first occurrence.

Replace All Occurrences

we use the g flag, denoting global.

sed 's/replaceThis/replaceWith/g' somefile.txt

Find and Replace In-Place (Overwrite File)

By default, when sed performs a find and replace (or any operation) on a file or text input, it prints the result to standard output. It does NOT save the modifications to the file. To save it in the same file we used as input, we can use the -i or --in-place flag:

sed -i 's/replacethis/replacewith/' somefile.txt

NOTE For MacOS and BSD Users: The BSD version of sed (also used on MacOS) is slightly different than the GNU version of sed found on most linux systems. The -i flag is one of those differences, where the flag requires a value (at least an empty one). I recommend you check the documentation, but the command will look something like:

sed -i '' -e 's/replacethis/replacewith/' somefile.txt

Alternatively, you can always install GNU sed on MacOS and BSD (available on homebrew and other package managers).

Save The Substitution Result to New File

If, instead of overwriting to the same file, we want to save it to a new file, we just redirect the command's result:

sed 's/replacethis/replacewith/' somefile.txt > newfile.txt

the > newfile.txt syntax is the redirect operator, which redirects the standard output of a command to a file (overwriting it if it already exists).

There Are Many More Options!

This is only a subset of all features available in sed, and not even all the options for the substitute command! My intention is not to duplicate sed's documentation, but rather introduce you to sed and cover relevant concepts.

Alternatives to SED

sed is a great tool for this task, but are there alternatives? Of all popular utilities, sed is the simplest. More complex alternatives may grant you more features depending on what you are looking for, but it comes at a cost. You can even do what sed does with a full featured scripting languages that has a wealth of libraries, like python or nodejs. But for simple tasks, it is overkill and requires a lot more code.

However, there are tools and languages that only require a bit more code than sed, but optionally have a lot more features. For example, awk is one example as we will discuss below, and there is also perl / raku as a full featured scripting language that has a lot of text editing capabilities, and you can perform substitution with little code.

AWK

A very common alternative to sed is awk. like sed, awk is also a text manipulation tool. There are a few differences:

awk is able to recognize and manipulate delimited text such as tabulated data, columned text, csv, etc. sed has no concept or recognition of this beyond its regex capabilities.
awk is a much more complete programming language than sed. It has loops, conditionals, data structures like arrays, and the like.
Consequently, awk can perform analytics tasks such as calculating sums or counting data, or perform transformations onto the columns, such as removing certain columns or combining them.

Recursive Find and Replace (In a Directory and All Sub-directories)

Now this is all great, but this is only slightly more impressive than text editor find-and-replace. Let uss see how the command line can extend the powers we already uncovered to apply the same functionality to multiple files.

The beauty of the shell (and I suppose most programming languages) is that we have constructs that allow us to apply a command to a dynamic input or operand (input here is the file(s) we are applying find and replace to). This dynamic input can be our list of many files. The question is, which files do we want to perform find and replace on? There are a few situations I have in mind:

Apply to all files in current directory
Apply to all files in current directory and sub-directories (recursion)
Apply only to files in a directory that have a certain metadata (file extension, name, when it was last edited, etc)
Apply only to files in a directory based on the file's content (i.e. the file's content has a pattern match)

There are many command line utilities that can help us here. I will focus on two: find and grep.

find: Search and Filter Files by Metadata

The find utility searches through all files and directories in a given directory based on their metadata. By default, it prints the file paths to standard output.

It has many useful options and filters. Things like: file or directory name, creation timestamp, modification timestamp, path, file size, etc. I will not go through them all (check the documentation or other resources).

Basic Usage

find path/to/directory -type f -name "*.md"

The -type f flag says: only return files, not directories.
The -name "*.md" flag says: only return files whose name ends in .md, meaning a markdown file. The * is a wild card match character.
The command above will therefore return files inside path/to/directory that meet the two criteria above.

We can even apply this to multiple directories:

find path/to/directory path/to/second/directory -type f -name "*.md"

Executing a Command Based on find 's Output

Executing a command based on another command's output is very common in the command line world. This is very often done via piping; directing a command's standard output to the standard input of another command. We did this earlier with sed and echo:

echo 'random text with PatternToReplace here' | sed 's/PatternToReplace/ReplaceWithThis/'

The | is the pipe operator.

This will not work with find. We will likely have more than one file being outputted by find, so we cannot simply pipe the result into sed, as it will think the file path/name is text and not a filename whose contents we want to modify. There are a few alternatives:

Command Substitution

The $() is the shell command substitution syntax. We can use:

sed 's/somePattern/ReplaceWith/g' $(find some/path -type f)

What command substitution does is it executes the command inside the parentheses, and substitutes the result in its place instead. So the shell will substitute $(find some/path -type f) with the file paths, and only then will it execute the sed command.

One advantage of this method is that we are only invoking the sed command once, but with the input being multiple files. Now I haven't benchmarked this, but sometimes, invoking the command once with several files as input vs. a single invokation per file might perform better. However in some cases, that option is not even available, and a command can only accept one file as argument. Some of the below options might help.

Piping with xargs

We said earlier that we cannot simply pipe the output of find into sed. However, we can pipe the output into a utility called xargs. xargs can take the output of a command that has multiple outputs like find, and then apply another command (in our case sed) once to every one of those outputs from the first command. Here is how we can do it:

find path/to/directory -type f -print0 | xargs -0 sed -i 's/somePattern/replaceWith/g'

xargs will take the output of find command before the pipe operator, and execute the sed command for every one of find's outputs.

Notice the -print0 and -0 flags. -print0 tells find to separate its output using a null character, and -0 in xargs tells it to look for the same said null character. This way, we make sure that whatever find uses to separate different paths in its output, xargs will use it to differentiate one file's path from another.

Like other command line utilities, xargs is very customizable and can fit many use cases, and there are many options documented in its documentation.

Using find's Built-in -exec Option

The options above are great and usable on various command line utilities. But find has its own built-in functionality for executing a command for every path matching its filters. This option is the -exec option.

find path/to/directory -type f -name "*.md" -exec sed -i 's/somePattern/replaceWith/g' {} \;

We simply put the command after -exec.

However, notice two unusual pieces of syntax. First, the {}. This syntax tells find where to insert a path it matched as an argument for the sed command. When executing the command, find will replace {} with the path. the \; is really just an escaped semicolon. The semicolon ; is needed to tell find where the -exec command is done. This can be useful if you want to tack on other options to the find command.

One thing to note is that you can have multiple -exec in the same find command! There's a lot of potential customizations that can be made for more complex situations.

Adding a Confirmation Prompt

Suppose we want to find and replace, and but we are not sure we want to replace every instance of the pattern we match. One way of solving this is previewing the change and asking the user to confirm if they want to do it.

There are several ways of doing this. We can either script the confirmation and preview on our own, or we can use another of find's built-in functionalities.

find's -ok Option

find's -ok option behaves very similarly to -exec, with one difference: before executing the command given, the user will be shown the exact command (after substitution) and ask the user to confirm if they want to execute it. Only then will the command be executed.

find path/to/directory -type f -ok sed -i 's/somePattern/replaceWith/g' {} \;

But this only asks for confirmation. Before confirming, I would ideally want to look at the file and its contents. How can we do this?

Previewing Changes

As we said earlier, find supports chaining multiple -execs in order. This applies to -ok as well. So, we can add a command that would preview the change without modifying the file, then use -ok to ask the user if they would like to proceed.

Using grep To Show State Before Modification

grep is an amazing and commonly used tool. It can search a file for a pattern and print to standard output the line(s) containing the pattern. We can use grep to show the state of the lines to be modified right before we modify it. It will also highlight the part of the line that will get modified. Since grep also uses regex, we can use the same regex we use with sed.

find path/to/directory -type f -exec grep somePattern {} \; -ok sed -i 's/somePattern/replaceWith/g' {} \;

The above command is very similar to the one previous, except we added the -exec flag with the grep command. The order here is very important. The grep command will execute first, * then,* the -ok command will be shown to the user and executed upon confirmation, as discussed earlier.

The grep has a lot of customizations that can be explored. One common example is the --context flag, which shows you lines before and after the match. For example, grep --context 2 --regexp somePattern [file-name] will show two lines before and after the match. If you want only lines before the match or only after the match, there is --after-context and --before-context.

Conclusion

This concludes the first chapter of our endeavor to explore options for find-and-replace in the command line, and explore the many options to customize them, and build up a command-line script to perform find-and-replace recursively with a confirmation prompt and preview (to avoid un-intended destructive behavior). The commands we built together are powerful on their own, but I hope that this post also helped you explore the endless possibilities for customization so you can fine-tune them to your needs, or use them on other commands, even unrelated to find-and-replace!