The Standard Streams
At the lowest level, all input and output is a stream of bytes flowing through your program. The bytes may come from a file, your keyboard or even from some remote computer on the network. Your program looks at the stream and changes it, consumes it, or sends it on to output.
Programs that process streams of characters are called text filters.
When you run a program, where do the input bytes come from and where do the output bytes go to? What is the data source (in the illustration above), and what is the data destination? Before your program starts, the operating system automatically opens three standard streams:
- stdin (standard input)
- stdout (standard output)
- stderr (standard error)
In C++, the built-in streams are used to initialize the cin, cout and cerr I/O objects. Java does the same thing, but uses them to initialize the System.in, System.out, and System.err objects. Python uses the streams directly.
The operating system connects these streams to your console (screen and keyboard). But, before you run your program, you may ask the OS to connect each stream to a different endpoint. This is known as redirection.
The cat Filter
Click the "running man" to open a Replit project which already contains a few files. Click the Fork Repl button to get your own copy, and then let's look at a few built-in Unix filters.
Close the editor tab (which contains the Makefile) and then click the link for the Shell.
Type the following command in the shell (terminal), and press ENTER.
The cursor simply blinks; you don't get a new prompt. Go ahead and type a few lines of text, pressing ENTER at the end of each line. The input you typed is echoed on the next line. Press CTRL+D to return to the prompt.
- Filter programs read from standard input and write to standard output.
- The cat filter concatenates each input character to standard output. In Windows, the equivalent filter is named type.
- The filter stops reading when it reaches end-of-file. In Unix, you simulate that by typing CTRL+D from the terminal. In Windows, it is CTRL+Z.
A filter is not meant to be run interactively. Instead, it is meant process a stream of data that is supplied from a file, a network stream or some other source. The easiest way to supply such a stream is to use input redirection.
Input Redirection
Input redirection allows you to run a program, and have that program get its input from a file or device, instead of from the keyboard. Your program doesn't need to change at all; it still reads from cin as always.
To see how input redirection works, first, open the file named input.txt by clicking its tab, so you can see what it contains. Then, type this command in the shell:
The input redirection symbol < asks the operating system to first open input.txt and then to connect that to the standard input stream. Now, cat gets its input from the file instead of from the keyboard.
This is text stored in "input.txt".
A second line in input.txt
$
When all of the data has been processed, the prompt returns. (I've colored the output so that the input you type appears in teal, and the output from the command appears in blue.)
Output Redirection
All standard output streams are connected to the console; any output appears on your screen. You can redirect standard output by using the > symbol when you run:
This time, no output will appear on your screen; instead, the file output.txt will be created and all the output, which would have been sent to the screen, will instead be written to the file. This can be a little dangerous, because if there is already an output.txt, it will be overwritten with the new data.
Ater you've typed the previous line, you can examine the new contents of output.txt by using cat again, like this:
This is text stored in "input.txt".
A second line in input.txt
$
Instead of erasing the existing data in the output file, you can append to it by using the >> symbols like this:
Try it and see what output.txt contains now.
Error Redirection
Type this command (exactly) in the shell, and press ENTER:
In this case, there is no file named input.text, so output.txt is erased. The cat filter prints an error message on the standard error stream, still connected to the screen. Output redirection only redirects standard output, not standard error.
You can the redirect standard error stream by using the symbol 2> like this:
$ cat < output.txt
$ cat < err.txt
bash: input.text: No such file or directory
$
Now output.txt is still empty, but err.txt contains the errors that originally appeared on the screen. Combine bothinto a single stream (which may be sent to a file) like this.
Sometimes, you don't want to see either the error messages or any progress reports. For instance, if you try to remove a file which doesn't exist, the shell displays an error message like this:
rm: cannot remove 'filter.exe': No such file or directory
Instead of redirecting those messages to a file, you can send them to the "bit bucket" which has the name, (in Unix), /dev/null. (If you are using redirection on Windows, the name is NUL: with the trailing colon.) Anything redirected to /dev/null just disappears.
Pipes & Pipelines
Input redirection gets input from a file and output redirection sends its data to another file. Pipes, however, redirect the output of one program so that it acts as the input of another program. The pipe character is the vertical bar. Several pipe commands is called a pipeline.
The Unix ls command shows the files of the current directory on standard output.
err.txt filter.cpp input.txt moby.txt output.txt
Of course you can save the directory listing to a file using output redirection:
However, instead of saving it, we can pipe the output to the wc (word count) filter, adding a command-line switch -l, to indicate that we only want to count the number of lines. Try it yourself and see what happens.
Here is another pipeline which lists the current directory, and then sorts the output in reverse order, sending that output to the screen.
One of the most useful Unix filters is grep (which stands for the mouthful "global regular expression parser"). While quite complicated, especially when used with regular expressions, it is easy to use for searching through text to find a particular word.
Let's find out, for instance, on which lines the name Ishmael is used in Moby Dick.
And how many lines are there??
Built-in Filters
Here are ten of the most common (and useful) built-in Unix filter programs. To find out how to use the program, just type man command into Google search, replacing command with the name of the filter program.
- cat: Displays the text of the file line by line.
- head: Displays the first n lines of the specified text files. If the number of lines is not specified then by default prints first 10 lines.
- tail: Works the same way as head, just in reverse order. The only difference in tail is, it returns the lines from bottom to up.
- sort: Sorts the lines alphabetically by default but there are many options available to modify the sorting mechanism. Be sure to check out the man page to see everything it can do.
- uniq: Removes duplicate lines. uniq only removes continuous duplicate lines. First use sort on your data before passing it to uniq.
- wc: Prints the number of lines, words and characters in the data.
- grep: Searches for particular information in a text file.
- tac: The reverse of cat. Instead of printing from lines 1 through n, it prints lines n through 1
- sed: sed stands for stream editor. It allows you to apply search and replace operations on your data very effectively. sed is an advanced filter and all its options can be seen on its man page.
- nl: nl is used to number the lines of your text data