Process Filters
Filters, you recall, are programs that read from standard input, and which write to standard output. Filters may change, use, or learn about the characters flowing through your program. Two kinds of filter programs are process filters and state filters.
- A process filter does something to the characters it encounters.
- A state filter learns something about the stream by examining characters.
Process filters apply some basic rule—the process—to the values in the stream. The simplest process filter is: read and echo (although I suppose that read and ignore would actually be simpler). That's what the cat filter does.
Process filters typically solve problems like this:
- Copy files or search for a particular value in a stream (cp and grep)
- Case modification or changing character order in a stream.
- Stream editing using a sequence of editing commands (sed)
- Translating data from one form to another (decimal to binary)
State Filters
State filters produce information by learning something about the data in a stream. State is shorthand for saying "what is the current status of this data". Characters, for instance, have values, but also belong to groups, like digit characters, alpha characters and so on.
State transitions are changes from one state to another. Most state filters work by finding the state transitions and then performing some action. Here are some uses:
- Counting the number of words in input (counting word transitions) (wc)
- Printing one sentence per line (looking for a period, question or exclamation mark)
- Compressing input (turn off echo when in blank-spaces state)
Often programs will contain both process-filter and state-filter portions. For one homework this week you'll write a state filter that removes comments in C++ source code, while in Lecture you'll write a process filter which encrypts and decrypts text.
Your First Filter
Now, let's look at writing our own filters. We'll continue working with the Replit project we used when learning about redirection. You can open it from your Repit account, or click the "running man", and re-fork it.
The program filter.cpp is the simplest possible version of the built-in filter cat. Remember that the cat command reads a character from standard input and sends it on to standard output, stopping only when there is no more input to be processed.
There are three ways to process input:
- Raw, or unformatted input (a byte or character at a time)
- Line-oriented input (one line at a time)
- Formatted or token-based input (a "word" at a time)
The program filter.cpp uses raw, unformatted input. Build and run the program in Replit like this. First, open the Shell tab and then type:
- make filter. If you have only a single file, you can build it by giving make the name of the output file. The make program finds filter.cpp file and then compiles and links it.
- ./filter < input.txt to run the program. This should produce exactly the same output as using cat.
Note, in Unix, to run a program located in the current working directory, first type the directory ./ and then the name of the program. You can repeat any of the previous exercises replacing cat with ./filter, and the results should be the same.
Data Loops
Let's look at how the filter program works. C++ input streams read a single character using the member function cin.get(). To read successive characters, until all of the data has been processed, use a data (or eof-controlled) loop. (eof is shorthand for end-of-file).
while there is still data to process read a data item process the data item
Translate this into C++ by using streams as conditions, as shown here:
char ch;
while (cin.get(ch))
{
cout.put(ch); // print the output
}
The expression cin.get(ch) does two things.
- It reads the next character from the stream into the char variable ch (which is passed to the function by reference). Whitespace is not skipped.
- It returns the input stream (in this case cin) after reading the variable so you can determine whether the I/O operation succeeded.
The cin object has a member function, named fail(), which indicates whether the last operation succeeded. fail() is implicitly called when a stream is used as a condition. In a condition, the stream is interpreted as true if it is still good, and as
falseWhen reading characters using cin.get(), input fails only if there are no characters left in the stream. The effect of the basic data loop is to execute the body of the while loop once for each character until the stream reaches what is known as end of the file.
For output streams, the put() member function takes a char value as its argument and writes that character to the stream.
More on Streams
When writing a function which processes an input or output, the stream parameters must always be passed by reference.
Here, for example, is a function that copies input to output.
void streamCopy(istream& in, ostream& out)
{
char ch;
while (in.get(ch)) { out.put(ch) };
}
We could rewrite filter by calling this function, like this:
int main()
{
streamCopy(cin, cout);
}
Other I/O Functions
When reading individual characters, you'll sometimes find that you have read more than you need. There are several ways to solve this problem in C++.
- in.unget() returns the last read character to the input stream.
- in.putback(ch) allows you to put back a different character.
- in.peek() looks at the next character in the stream, but doesn't remove it from the stream.
The C++ library guarantees that it you can push back push one character. You are not able to read several characters ahead and then push them all back. Fortunately, being able to push back one character is sufficient in the vast majority of cases.