Pointers
A pointer is variable that contains the address of another variable. In languages like Java, C# and Python, pointers are hidden from the programmer, and used only by the runtime system. In C++, understanding pointers is necessary for understanding how C++ programs work.
An expression that refers to an object in memory is an lvalue. Variables are lvalues because you can store data in them. A named constant is a non-modifiable lvalue. Many values in C++ are not lvalues; the result of an expression is a temporary value, but it is not an lvalue, because you cannot assign a new value it.
The following properties apply to modifiable lvalues in C++:
- Every lvalue is stored somewhere in memory; thus it has an address.
- The address of an lvalue never changes, even though the contents of those memory locations may change.
- The address of an lvalue is a pointer or address value, which can be stored in memory and manipulated as data.
To store an address value in memory, you create a pointer variable. Thus, a pointer is simply a variable that stores the address of some object in memory.
Defining Pointers
To define a pointer, add an asterisk (*) between the variable type and the variable name in the variable definition. Here, p is a pointer variable that "points to" an int; its type is pointer-to-int.
int *p;
In this context, * is the pointer declarator operator. It turns the name on its right into a pointer to the type on its left. The line below defines cptr, a pointer-to-char.
char *cptr;
Even though p and cptr are both pointers, each is a distinct type; pointers are very strongly typed and there are no implicit conversions between pointer types.
A pointer belongs syntactically with the variable name and not with the base type.
int* p1, p2; // p1 is a pointer, p2 an int
int *p3, *p4; // Both are pointers
If you use the same declaration to define two pointers of the same type, you need to mark each of the variables with an asterisk.
Initializing Pointers
A pointer can be in one of four states:1
- It can point to a valid object.
- It can point one-past a valid object (in an array or vector for instance.
- It can contain the value nullptr to indicate it points to "nothing", or is unused.
- It can be invalid, such as an uninitialized pointer.
You can initialize a pointerin several ways.
- With the address of another object obtained from the address operator.
- With the address of an object created on the heap with the new operator.
- With the name of a previously defined array.
- By using pointer assignment to copy the address from another pointer
If you don't initialize a pointer, it is invalid. Here are examples of each of these:
int x{42}, y{0}, a[10]; // x->int, y->int, a->array
int *p1{&y}; // points to y
int *p2{&x}; // points to x
int *p3{new int{3}}; // points to int on heap
int *p4{a}; // points to first element of a
int *p5{a+10}; // points "one past" the array a
int *p6{nullptr}; // points to "nothing"
int *p7; // uninitialzed (invalid)
1 Lippmann, C++ Primer, 5th Edition, Page 52, Section 3.3.2
Dereferencing Pointers
Let's look at the list of pointers on the previous page again.
int x{42}, y{0}, a[10]; // x->int, y->int, a->array
int *p1{&y}; // points to y
int *p2{&x}; // points to x
int *p3{new int{3}}; // points to int on heap
int *p4{a}; // points to first element of a
int *p5{a+10}; // points "one past" the array a
int *p6{nullptr}; // points to "nothing"
int *p7; // uninitialzed (invalid)
The * dereferencing operator returns the value that a pointer points to, provided that the pointer points to a valid object, such as p1 and p2. Using the dereferencing operator on p5, p6 or p7 produces undefined behavior. The value that a pointer "points to" is called its indirect value.
Since p1 is a pointer to int, the compiler "knows" that *p1 must be an integer object. Thus *p1 turns out to be another name (or alias) for the variable y. Like the simple name y, *p1 is an lvalue , and you can assign new values to it.
int x{42}, y{0};
int *p1{&y}; // points to y
int *p2{&x}; // points to x
*p1 = 17; // assign to indirect value
This last statement changes the value in the variable y because that is the target of the pointer p1. p1 is unaffected by this assignment; it continues to point to the variable y. Click the little running-man on the left to see this animated in a new window.
Pointer Assignment
It is also possible to assign new values to the pointer variables themselves. Look at this animation. Before you do, change the drop-down list so that it says "show memory addresses".
Line 6 makes a copy of the direct value (that is the address) stored in p2 and copies it into the variable p1. Afterwards, both variables now point to the same location.
If you draw your diagrams using arrows, keep in mind that copying a pointer replaces the destination pointer with a new arrow that points to the same location as the old one. Thus p1 = p2 changes the arrow leading from p1 so it points to the same location as the arrow originating from p2.
It is important to distinguish the assignment of a pointer from that of a value. Pointer assignment, p1 = p2, makes p1 and p2 point to the same location. By contrast, value assignment, *p1 = *p2, copies the value from the location pointed to by p2 into the location pointed to by p1.
The "null" Pointer
The value that indicates that a pointer is not being used is called the null pointer. It is represented internally by 0. While you cannot assign an arbitrary integer to a pointer variable, you can assign the value 0.
Using a literal 0, however, makes it hard to find all of the null pointers in your code. C++11 introduced an actual null pointer constant named nullptr. You should use that instead 0. Do not use the C language value NULL.
It is illegal to dereference a null pointer. In UNIX, it usually results in a segmentation fault, but that is not guaranteed. Some machines return the contents of address 0000. As a result, this is undefined behavior, as in the case of uninitialized pointers.
If you declare a pointer but fail to initialize it, the computer tries to interpret the contents of that pointer as an address and tries to read that region of memory. Such programs can fail in ways that are extremely difficult to detect. Again, this is undefined behavior.