Week 8

Why Vectors

Variables are named 'boxes' that hold a value. But, when you want to manage a collection of similar and closely related values, using individual, named variables is unwieldy. Imagine using named variables to track the scores for each student in this section of CS150; it would be nigh-on impossible.

For instance, to print the statistics for one exam, for one section, you'll need to write a function like this:

void printClassStatistics(istream& grades)
{
    // Grade for each student
    double student01, student02, student03, student04, 
        student05, student06, student07, student08, student09, 
        student10, student11; // And so on . . . to student45
    // Read grades from stream
    grades >> student01 >> student02 >> student03 >> student04 
        >> student05 >> student06 >> student07 >> student08 
        >> student09 >> student10 >> student11; // and so on
}

Now, imagine how long and unwieldy it would become when it came time to add the scores for a quiz. You'd need a separate statement for each student variable.

If you think "There must be a better way!" you're right; there is.

Week 8

Meet the vector

The solution this problem is to use the C++ library class named vector. The most used of the standard library's collection classes, a vector:

Stores multiple variables, of any type, in a list.

Decorative picture of muffin-tin and egg-carton.

Each variable (or element) must be exactly the same type. Just as you can't put eggs in a muffin tin or bake muffins in an egg carton, you can't put a string in a vector of int. Because of this, we say a that vector is a homogenous collection.

When you create a vector, its elements are stored right next to each other in memory; we say that the elements are contiguous. Each element (except the first and last) has a predecessor and a successor, so we say that the collection is linear.

Each element uses exactly the same amount of memory as any other, which allows your computer to locate an individual element instantaneously, using simple arithmetic, rather than by searching.

When creating a vector, you give the entire collection a name, just like you would any other simple variable. You access vector elements by specifying an index or subscript representing its position in the collection. The first element has the subscript 0, the next 1, and so on.

Why Start with Zero?

Should numbering start with zero or one? If you think of a subscript as a counting number—#1, # 2, etc...—then zero-based subscripts make no sense. But subscripts are not counting numbers, but offsets from the beginning of the collection.

If you tell the compiler to retrieve v.at(5), you are asking it to go to start of the vector named v, then "walk past" five elements and bring you the element which it finds there. If you tell it to access v.at(0), the compiler knows it doesn't have to skip any elements at all, and it brings you the first element.

Week 8

vector Variables

All library collection classes specify the kind of values they contain by including the base type name in angle brackets following the class name. For example:

vector<int>     //represents a vector that contains integers
vector<Card>    // a vector of playing cards (a user-defined type)
vector<string>  // a vector containing string objects.

Classes with a base-type specification are called parameterized classes, and they are implemented, in C++, using a technique called class templates. This means that the classes, vector<int>, vector<Card>, and vector<string> are independent classes which each share a common general structure.

To use the standard collections, include the appropriate header (<vector>). The vector, like the string class, is in the std namespace.

Creating a vector variable is similar to creating an int variable:

int n;  // create an integer
vector<int> iVec;     // an empty vector that stores integers
vector<double> dVec;  // stores doubles
vector<string> sVec;  // stores strings

The variable, iVec is a vector of integers. There is no separate instantiation step as in Java, where you might expect to write something like this:

vector<int> ivec = new vector<int>(); // Illegal
vector<int&< v1; // Illegal
const vector<int>& = ...;  // OK

You cannot create a vector of references, but a vector may, itself, be a reference (usually when used as a parameter).

Week 8

vector Initialization

A newly-created vector is empty; it contains no elements. To create a sized vector, specify the initial size (in parentheses) when the vector is created. For example, to create a vector that initially holds fifteen elements, write:

vector<int> iVec(15);

This specifies the initial size; you may add additional elements later. All elements are default-initialized. For primitive types, such as int, that means they are set to 0. For class types, such as string, each element is constructed by implicitly calling its default constructor.

You may want to initialize all elements with a value other than zero. Suppose, for example, you want five copies of the string "(none)" or twenty copies of the int value 137. To do this, specify both the number of elements, and default value for each element elements like this:

vector<int> v137(10, 137);
vector<string> vNone(5, "(none)");

This syntax is only legal when initially creating a vector. You must use parentheses; if you use braces, something else will happen.

List and Copy Initialization

In C++11 (or later), you can specify a initial list of values. Write the values, separated by commas, andsurrounded by braces. This doesn't work in C++ 98.

vector<string> months{"Jan", "Feb", "Mar"};

Lastly, you can initialize one vector, from an existing vector. The new vector is a completely independent copy of the original, not an alias, as in Java. Here's how:

vector<int> v1{...};
. . .
vector<int> v2(v1);

Week 8

Element Access

The variables stored inside a vector are called its elements. To access an individual element, you use its subscript (or index). This is called selecting the element. To select an element, pass the subscript as an argument to the vector::at() member function, or, place the subscript in square brackets after the variable name. (The square brackets are called the subscript operator).

cout << v[1] << endl;   // the subscript operator
cout << v.at(1) << endl;  // the at member function

Both the subscript operator and the at() member function return a reference to the selected element, so they may appear on either side of an assignment.

vector Size

You can find out how many elements a vector contains by using the size() member function. The first element in the vector is at subscript 0, while the last is at v.at(v.size() - 1). C++11 (and later) added two additional member functions, front() and back which return a reference to the first and last elements.

Calling front() or back() on an empty vector is undefined. You can find out if a vector contains elements by calling its empty() member function, or by checking if its size() is greater than zero.

Undefined Behavior

Subscripting a vector past its bounds is an error which invokes undefined behavior in C++. Undefined behavior means that the compiler is completely free to ignore the error (which is what usually happens). However, the compiler is also free to format your hard disk, send your credit-card credentials to Timbucktu, or, to make demons fly out of your nose.

C++ also has implementation-defined and unspecified behavior. Implementation-defined means the compiler must pick a particular implementation, and document it, such as the size of an integer. Unspecified means that the compiler must do something reasonable, but need not document what it does. The order in which arguments are evaluated when passed to a function is unspecified; it may be different each time.

Week 8

Range Checking

Programmers coming from other languages often gravitate to the subscript operator since it is similar in syntax to the array operations which the vector emulates. However, if you use the subscript operator, your program will do no range checking!

Before you go any further, read over that last line again. Most of you probably cannot imagine a language that does not do some sort of range checking. Let me illustrate what happens in C++:

vector<int> v(4);   // 4 elements
cout << v[4];       // access out of bounds
v[4] = 27;          // writing out of bounds
cout << v[4];

You will not get a compiler error or a runtime error, as you would in Java or Python, even though you are accessing (and even writing to) an element that is outside of the vector bounds.

This is an error, though. Often, cout will print the value stored in the location where the fifth element of v would be stored, if it existed. If that is the case, on your platform, then the assignment will happily store the value 27 in the same location, regardless of what is currently stored there. If you think, "Well, that's not so bad!", then what about this?

v[1075935] = 27;

Again, you'll get no nice stack trace like you would get with Java or Python, telling you that your index is out of bounds. If you are very lucky, your operating system will shut down your application rudely with a segmentation fault. If you aren't, you will silently corrupt a portion of your own application, producing a bug that shows up days, weeks or months later. Not good.

Using the at Member Function

Fortunately, you can fix this deficiency by using at(). When you use at(), the compiler generates code to check out-of-bound subscripts; you don't have to rely on accidentally stepping on the toes of your operating system to find your errors.

Other than the slight performance hit, I can't think of any reason not to always use at() instead of the subscript operator. Combined with C++ exception handling, your code will be safer and have fewer bugs.

You can also modify the vector class so that subscripts do throw exceptions. That's what Bjarne Stroustrup does in Section 4.4.1.2 of the Tour of C++ (page 104).