Why Vectors
Variables are named 'boxes' that hold a value. But, when you want to manage a collection of similar and closely related values, using individual, named variables is unwieldy. Imagine using named variables to track the scores for each student in this section of CS150; it would be nigh-on impossible.
For instance, to print the statistics for one exam, for one section, you'll need to write a function like this:
void printClassStatistics(istream& grades)
{
// Grade for each student
double student01, student02, student03, student04,
student05, student06, student07, student08, student09,
student10, student11; // And so on . . . to student45
// Read grades from stream
grades >> student01 >> student02 >> student03 >> student04
>> student05 >> student06 >> student07 >> student08
>> student09 >> student10 >> student11; // and so on
}
Now, imagine how long and unwieldy it would become when it came time to add the scores for a quiz. You'd need a separate statement for each student variable.
If you think "There must be a better way!" you're right; there is.
Meet the vector
The solution this problem is to use the C++ library
class named vector
. The most used of the standard library's
collection classes, a vector
:
- Stores multiple variables, of any type, in a list.
-
Each variable (or element) must be
exactly the same type. Just as you can't put eggs in a
muffin tin or bake muffins in an egg carton, you can't put a
string
in avector
ofint
. Because of this, we say a thatvector
is a homogenous collection.
When you create a vector, its elements are stored right next to each other in memory; we say that the elements are contiguous. Each element (except the first and last) has a predecessor and a successor, so we say that the collection is linear.
Each element uses exactly the same amount of memory as any other, which allows your computer to locate an individual element instantaneously, using simple arithmetic, rather than by searching.
When creating a vector, you give the entire collection a name, just like you would any other simple variable. You access vector elements
by specifying an
index or subscript representing its position
in the collection. The first element has the subscript 0
, the
next 1
, and so on.
Why Start with Zero?
Should numbering start with zero or one? If you think of a subscript as a counting number—#1, # 2, etc...—then zero-based subscripts make no sense. But subscripts are not counting numbers, but offsets from the beginning of the collection.
If you tell the compiler to retrieve v.at(5)
, you are asking it
to go to start of the vector
named v
, then "walk past" five elements and bring you the
element which it finds there. If you tell it to access v.at(0)
,
the compiler knows it doesn't have to skip any elements at all, and it
brings you the first element.
vector Variables
All library collection classes specify the kind of values they contain by including the base type name in angle brackets following the class name. For example:
vector<int> //represents a vector that contains integers
vector<Card> // a vector of playing cards (a user-defined type)
vector<string> // a vector containing string objects.
Classes with a base-type specification are called
parameterized classes, and they are implemented, in C++, using a
technique called class templates. This means that the
classes, vector<int>
,
vector<Card>
, and vector<string>
are independent classes which each share a common general structure.
To use the standard collections, include the appropriate header (<vector>
). The
vector
, like the string
class, is in the std namespace
.
Creating a vector
variable is similar to creating an int
variable:
int n; // create an integer
vector<int> iVec; // an empty vector that stores integers
vector<double> dVec; // stores doubles
vector<string> sVec; // stores strings
The variable, iVec
is a vector
of integers. There is no separate instantiation step as in Java,
where you might expect to write something like this:
vector<int> ivec = new vector<int>(); // Illegal
vector<int&< v1; // Illegal
const vector<int>& = ...; // OK
You cannot create a vector
of references, but a vector
may, itself, be a reference (usually when used as a parameter).
vector Initialization
A newly-created vector
is empty; it contains no
elements. To create a sized
vector
, specify the initial size (in
parentheses) when the vector
is created. For example,
to create a vector
that initially holds fifteen elements, write:
vector<int> iVec(15);
This specifies the initial size; you may add additional elements later.
All elements are
default-initialized. For primitive types, such as int
, that means they are set to 0
. For class types, such as string
, each element is constructed by implicitly calling its
default constructor.
You may want to initialize all elements with
a value other than zero. Suppose, for example, you want five
copies of the string
"(none)"
or twenty copies of the
int
value 137
. To do this,
specify both the number of elements, and default value for each element
elements like this:
vector<int> v137(10, 137);
vector<string> vNone(5, "(none)");
This syntax is only legal when initially
creating a vector
. You
must use parentheses; if you use braces, something else will
happen.
List and Copy Initialization
In C++11 (or later), you can specify a initial list of values. Write the values, separated by commas, andsurrounded by braces. This doesn't work in C++ 98.
vector<string> months{"Jan", "Feb", "Mar"};
Lastly, you can initialize one vector
, from an existing vector
. The new
vector
is a completely independent copy of the original,
not an alias, as in Java. Here's how:
vector<int> v1{...};
. . .
vector<int> v2(v1);
Element Access
The variables stored inside a vector are called its
elements. To access an individual element, you use its subscript (or index). This is called selecting the element.
To select an element, pass the subscript as an argument to the
vector::at()
member function, or, place the subscript in square brackets after the variable name. (The square brackets are called the
subscript operator).
cout << v[1] << endl; // the subscript operator
cout << v.at(1) << endl; // the at member function
Both the subscript operator and the
at()
member function return a reference to the selected
element, so they may appear on either side of an assignment.
vector Size
You can find out how many elements a vector
contains by using the size()
member function. The first element in
the vector
is at subscript 0
, while the last is at
v.at(v.size() - 1)
. C++11 (and later) added two additional
member functions,
front()
and back
which return a reference to the first
and last elements.
Calling
front()
or back()
on an empty
vector
is undefined. You can find out if a vector
contains
elements by calling its empty()
member function, or by checking if
its size()
is greater than zero.
Undefined Behavior
Subscripting a vector
past its bounds is an error which invokes undefined behavior in C++. Undefined behavior means that the compiler is completely free to ignore
the error (which is what usually happens). However, the compiler is also free
to format your hard disk, send your credit-card credentials to Timbucktu, or,
to
make demons fly out of your nose.
C++ also has implementation-defined and unspecified behavior. Implementation-defined means the compiler must pick a particular implementation, and document it, such as the size of an integer. Unspecified means that the compiler must do something reasonable, but need not document what it does. The order in which arguments are evaluated when passed to a function is unspecified; it may be different each time.
Range Checking
Programmers coming from other languages often gravitate to the subscript operator since it is similar in syntax to the array operations which the vector emulates. However, if you use the subscript operator, your program will do no range checking!
Before you go any further, read over that last line again. Most of you probably cannot imagine a language that does not do some sort of range checking. Let me illustrate what happens in C++:
vector<int> v(4); // 4 elements
cout << v[4]; // access out of bounds
v[4] = 27; // writing out of bounds
cout << v[4];
You will not get a compiler error or a runtime error, as you would in Java or Python, even though you are accessing (and even writing to) an element that is outside of the vector bounds.
This is an error, though. Often, cout
will print the value stored in the location where the fifth element of v
would be
stored, if it existed. If that is the case, on your
platform, then the assignment will happily store the value 27
in
the same location, regardless of what is currently stored there.
If you think, "Well, that's not so bad!", then what about this?
v[1075935] = 27;
Again, you'll get no nice stack trace like you would get with Java or Python, telling you that your index is out of bounds. If you are very lucky, your operating system will shut down your application rudely with a segmentation fault. If you aren't, you will silently corrupt a portion of your own application, producing a bug that shows up days, weeks or months later. Not good.
Using the at Member Function
Fortunately, you can fix this deficiency by using at()
. When you
use at()
, the compiler generates code to check out-of-bound
subscripts; you don't have to rely on accidentally stepping on the toes of
your operating system to find your errors.
Other than the slight performance hit, I can't think of any reason not to always use
at()
instead of the subscript operator. Combined with C++ exception handling, your code will be safer and have fewer bugs.
You can also modify the vector class so that subscripts do throw exceptions. That's what Bjarne Stroustrup does in Section 4.4.1.2 of the Tour of C++ (page 104).