Week 2
String Members

Below are the member functions you should memorize:

String members
size the number of characters in the string (may also use length)
empty true if the string contains no characters
at an individual character at a particular position (may also use [])
front, back the character at the front, and at the back (C++11)
substr a new string created from a portion of an existing string
find, rfind index of the substring searched for (from front or back)

You can look up the rest.

The size Member Function

s.size() returns the number of characters in the string s. For historical reasons, you can also use length(), but all of the other collections in the library use size(), so you should probably get used to using that. (Plus, it's less typing 😄).

The size() member function returns an unsigned integer, not an int as it does in Java, which may be defined differently on different platforms.

  • On an embedded platform, with little memory, size() could return a 16-bit unsigned short.
  • More commonly, strings can be as big as 4 billion characters, so an unsigned int is often large enough.
  • However, you can’t assume that is true. I recently recompiled some older code and discovered several places where I had assumed that size() returned an unsigned int, but the platform I was on used a 64-bit unsigned long instead.

This seems complex, since you don't want to re-edit your code each time you move to a new compiler. Here are three different ways to store the value returned from calling size() that work regardless of the platform:

string str { ... }  // string of any size
string::size_type len1 = str.size();
auto len2 = str.size();
size_t len3 = str.size();
  1. To be slavishly, pedantically correct, use string::size_type.
  2. Use auto which infers the type from the initializer. (You must use =, not braces.)
  3. Use the type size_t. This is the unsigned machine type, so your code will be adjusted automatically for each platform.

I believe that the easiest method is the last, and that's what I'll do in this class.

Week 2
Characters
Decorative image of a typewriter.

Individual characters in C++ are represented by the built-in primitive data type named char ( usually pronounced "tchar", not "kar"). In memory, these values are represented by assigning each character an 8-bit integer code called an ASCII code . (Actually, only 7-bits are defined by C++, so the ASCII values 128-255 are non-standard and may vary from platform to platform.)

You write character literals by enclosing each character in single quotes. Thus, the literal 'A' represents the internal code of the uppercase letter A.

In addition, C++ allows you to write special characters in a multi-character form beginning with a back-slash (\). This form is called an escape sequence. This includes the newline (\n), the tab (\t), and a double-quote inside a string literal (\"). Here is a list of the C++ escape sequences .

Character Functions

It is useful to have tools for working with individual characters. The <cctype> header contains a variety of functions that do that. There are two kinds of functions.

  • Predicate classification functions test whether a character belongs to a particular category. Calling isdigit(ch) returns true if ch is one of the digit characters in the range between '0' and '9'. Similarly, isspace(ch) returns true if ch is any of the characters that appear as white space on a display screen, such as spaces and tabs.
  • Conversion macros make it easy to convert between uppercase and lowercase letters. Calling toupper('a'), for example, returns the character 'A'. If the argument is not a letter, the function returns it unchanged, so that tolower('7') returns '7'.
Week 2
Selecting Characters

Positions in a string are subscripted (or indexed) starting at 0. The characters in the string "hello, world" are index like this: The memory layout of the string hello, world.

The numbers are alled the index or subscript; they must be positive (unlike Python where subscripts can be negative). Indexes start at 0 because it represents how many steps you need to travel from the beginning of the string to get to the element you are interested in. To retrieve the 'e', you have to travel one character from the beginning, so its subscript is 1.

The <string> library has four ways to select characters from a non-empty string:

  • Use the subscript operator like this: cout << str[0];
  • Use the member function at() like this: cout << str.at(0);
  • Use the members front() and back() in C++ 11+: cout << str.front();

If the string variable str contains "hello, world", all of these expressions refer to the character 'h' at the beginning of the string.

The at() member function makes sure the index is in range; the subscript operator does not. Using the subscript operator when a subscript is out of range is undefined. You should generally use at() unless you are certain that your indexes are in range.

Selecting an individual character in a string returns a reference to the character in the string instead of a copy of that character, as Java's charAt(index) method does. You may assign a new value to that reference, like this:

str[0] = 'H';       // or
str.at(0) = 'H';    // works as well

Both lines change the value from "hello, world" to "Hello, world".

Week 2
Substrings

To create a new string, initialized with only a portion of an existing string (called a substring), use the member function named substr() which takes two parameters:

  • the index of the first character you want to select
  • the desired number of characters.

Calling str.substr(start, n) creates a new string by extracting n characters from str starting at the index position specified by start. For example, if str contains the string "hello, world", then the following code prints "ell".

string strhello, world;
cout << str.substr(1, 3) << endl;

The string begins at 0, so the character at index 1 is the character 'e'.

Be careful with the substr() function, when switching between Java and C++. In Java, the second parameter to its substring() method is the ending index; in C++, though, it is the number of characters in the returned substring. Forgetting this can lead to hard-to-find bugs (and crashes).

The second argument in substr() optional; if missing, substr() returns the substring that starts at the index and continues to the end. For instance,

   cout << str.substr(7) << endl;

returns the string "world". While this line

   cout << str.substr(str.size() / 2) << endl;

uses substr() to print the second half of str, which includes the middle character if the size of str is odd:

When using the substr(start, end) version of substr(), if n is supplied but fewer than n characters follow the starting position, substr() returns characters only up to the end of the original string, instead of causing a runtime error. If, however, start is beyond the length of the string, you will get an error. If start is equal to the length of the string, then substr() returns the empty string.

Week 2
Searching a String

To search for both characters and substrings, the string class contains a member function find(), which comes in several forms. The simplest form looks like this:

auto index = str.find(target)

The argument target is what you’re looking for.

  • target may be a string, a char or a C-string literal.
  • The function searches through str looking for the first occurrence of target.
  • If target is found, find() returns the index at which the match begins. Use auto or size_t to store this.

If you want to find the last occurrence of target, use rfind() instead.

Not Found

If target is not found, then find() returns the constant named string::npos. This constant is defined as part of the string class and therefore requires the string:: qualifier. This is a good candidate for a named constant in your code:

   const auto kNotFound = string::npos;

The find() member function takes an optional second argument to indicate the index at which to start the search. Both styles of the find() member function are illustrated here:

string strhello, world;
auto a = str.find('o');         // char, 4
auto b = str.rfind("o");        // C-string, 8
auto c = str.find('l', 4);      // 10
auto d = str.find("waldo");     // string::npos

The find() member functions consider uppercase and lowercase characters to be different. Unlike Java, there is no built-in toUpperCase() or toLowerCase() member function in the string class.

Variations

In addition to find() and rfind(), you can find the position of the first (or last) occurrence of a character that appears in a set or that doesn't appear in a set. Here are some examples:

string s"Hooray", the crowd cheered!;
// first lower-case vowel
auto a = s.find_first_of("aeiou");      
// last punctuation
auto b = s.find_last_of("\",.!:;");     
// first non-whitespace
auto c = s.find_first_not_of(" \t\n");