Introducing Arrays
C++ has a built-in array type, inherited from the C programming language. The array builds upon the pointer concepts you covered in the last few lessons. Arrays (unlike the vector type) are directly supported by your hardware.
0 | 1 | 2 | 3 | 4 | 5 | 37 | -15 | 125 | 12 | 7 | 42 | 0x505290 | 0x505294 | 0x505298 | 0x50529c | 0x5052a0 | 0x5052a4 |
---|
Arrays are similar to the vector library type, except that they are much simpler, and have many fewer features:
- Arrays are allocated with a fixed size when created, which cannot change.
- The array size is not stored along with the array data elements.
- Arrays have no built-in functions, such as support for inserting or deleting.
- C++ performs no bounds-checking on arrays.
- Array elements may be allocated on the stack or static area so they are more efficient than vector, whose elements are always allocated on the heap, even if the vector itself is on the stack.
Arrays are the low-level plumbing from which the more powerful collection classes are built. To understand the implementation of those classes, you need to have some familiarity with the mechanics of arrays.
Arrays vs. vector
Why use an array if vector is so much more convenient? Why bother learning about arrays at all? Here's why.
- vector elements are always allocated on the heap
- Arrays may be allocated on the stack, static area, or the heap. This avoids performance issues that arise with dynamic memory.
- Arrays often have higher performance and take less memory.
- Arrays are generally used for low-level systems programming (operating systems)
- Array performance is deterministic; for this reason, arrays are normally used for embedded programs that must run for long periods of time.
In short, arrays are usually faster and may take less memory than dynamic library types like string and vector. Using arrays in C++ is programming as the CPU sees it.
Defining Arrays
An array must be defined before it is used:
base-type name[size];
The definition requires a base type, name, and allocated size; size is a positive integer constant expression indicating the number of elements for the compiler to allocated. For example:
const size_t kSize = 6;
int a[kSize];
0 | 1 | 2 | 3 | 4 | 5 | ? | ? | ? | ? | ? | ? | 0x505290 | 0x505294 | 0x505298 | 0x50529c | 0x5052a0 | 0x5052a4 |
---|
This creates an array named a, of 6 elements, each of which is an uninitialized int.
- A good practice is to specify the size as a symbolic constant instead of a literal.
- The size must be positive; zero or negative are illegal.
- The size must be constant; a regular, non-const variable should not work, although some compilers may permit it.
- If defined inside a function, the elements are on the stack; if defined outside of a function, the elements are allocated in the static storage area.
Index numbers begin with 0 and run up to the array size minus one.
C++ arrays are different than those in Java where the array variable and the allocated actual array are different. In C++ there is no array variable equivalent. Instead, the array name (a in the example) is an alias for the address of the first element, 0x505290 here.
Array Initialization
When you define an array, its elements are default initialized. So, what does that mean?
- If the base type is a primitive and the array is local the elements are uninitialized. Note: this is unlike vector where elements are initialized to 0.
- If the base type is a primitive and the array is global or static, the elements are 0.
- If the base type is a class type then the default constructor for the type is used to initialize each element. (Different from Java or C#, where the elements are null.)
Arrays may be explicitly initialized at definition time using a list of initializers enclosed in curly braces. C++11 list initialization was patterned after this built-in array feature. Note, however, with the array you must add the = sign.
const int digits[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
When using an explicit initializer you may skip the explicit array size; the compiler counts the number of values you supply. With digits the compiler implicitly supplies the size 10.
You can combine these two forms of definition; you can specify an array allocation size (or dimension) and provide an initializer list as well.
- If you have fewer initializers than the size, the others are set to zero.
-
If you supply a size, then the number of initializers
cannot exceed the dimension.
const size_t kLen = 3; int a1[kLen] = {0, 1, 2}; // [0, 1, 2] int a2[] = {0, 1, 2}; // [0, 1, 2] int a3[kLen + 2] = {0, 1, 2} // [0, 1, 2, 0, 0] int a4[kLen - 1] = {0, 1, 2} // ERROR int a5[kLen]; // Uninitialized int a6[kLen] = {}; // [0, 0, 0]
The Allocated Size
Suppose that you have an array containing the names of all U.S. cities with populations of over 1,000,000. Taking data from the 2010 census, you would write:
const string cities[] = {
"New York", "Los Angeles", "Chicago",
"Houston", "Philadelphia", "Phoenix",
"San Antonio", "San Diego", "Dallas",
};
However, the size of the cities changes over time. In 2020, both San Jose and Austin Texas joined the list. Fortunately, you may simply add new names to the list, or delete them, and then let the compiler count how many there are. This is so common, you are allowed to leave a trailing comma (such as the one after Dallas) and it doesn't create a syntax error.
So, how do you know how many cities there are? You don't want to have to count them! After all, the compiler knows how many there are. C++ has a standard idiom for determining the allocated size of an array at compile time, provided the array definition is in scope.
constexpr size_t kCitiesSize = sizeof(cities) / sizeof(cities[0]);
This compile-time expression takes the size of the entire array (returned from the sizeof operator, and thus of type size_t), and divides it by the size of the initial element in the array. Because all elements of an array are the same size, the result is the number of elements in the array, regardless of the element type.
Array Selection
Array selection uses the subscript operator as in Java. The result of selection expression is an lvalue, which means that you can assign new values to it, like this:
const int kLen = 6;
int a[klen];
for (size_t i = 0; i < kLen; ++i) {
a[i] = 10 * i;
}
0 | 1 | 2 | 3 | 4 | 5 | 0 | 10 | 20 | 30 | 40 | 50 |
---|
C++ performs no bounds checking on array selection. Unlike vector, there is no safe, range-checked alternative, such as at(). If the array index is out of range, your compiler decides where the element would be in memory, and performs the requested operation, leading to undefined results.
Worse still, if you assign a value to that index position, you can overwrite the contents of memory used by some other part of the program. Writing beyond the end of an array is one of the primary vulnerabilities used to attack computer systems.
Array Characteristics
The array name is synonymous with the address of its first element. This address cannot be changed; it is constant. Look at this code fragment:
int list[5];
cout << list << endl;
What prints is the address of the first element; not the contents of the first int, and not the contents of the entire array. Here's one possible output. Click the link to try it.
Since an array name is not a variable, you cannot assign to an array, nor, can you meaningfully compare two arrays using the built-in comparison operators.
int a1[] = {1, 2, 3}, a2[3];
a2 = a1; // 1. Illegal
a2[0] = a1[0]; // 2. Fine
a2 = {1, 2, 3}; // 3. Illegal
if (a1 == a2) . . . // 4. Legal, but stupid
The arrays a1 and a2 are the same type and dimension. Given that:
- It is illegal to assign a1 to a2. The name a1 is the address of the first element in the array. It is not a variable that can be assigned to. This is completely different from structures, where a1 and a2 would both be variables (lvalues) and thus could be assigned to.
- It is legal to assign to array elements; a1 is not a variable, but a1[0] is.
- You can list initialize an array, but you cannot list assign to the array.
- With structures, comparing two variables is a syntax error. Comparing two array names, while legal, is very stupid.
Since the array name is the address of the first element in the array, and since the two arrays cannot live at the same location in memory, the comparison must be false. It doesn't matter what is inside the array.