Week 11

The Iron Age

A binary cartoon.

Back in the computing "iron-age", when programmers wrote their code in

machine or assembly language, a programmer would manipulate a data element representing quarterly sales or the velocity of a missile by using the data element's memory address, like the line below, which stores a value (from the CPU's AX register) in the memory location A42C: A fragment of assembly language.

When high-level languages, like FORTRAN and COBOL were developed, they made things much easier. Now programmers could use names for the data elements like this:

int velocity = 125;

There is no more need to keep track of exactly where the value 125 is stored, not what symbolic value it represents; the compiler takes care of the minutia of associating the address A42C with the velocity.

A data-types icon.

Even better, though is the fact that the compiler can now keep track of what kind of thing you store in the memory location A42C and warn you if you make a mistake when using it. Just like 7-11 has different kinds of containers for each of their beverages, programmers now have different kinds of variables for different kinds of data.

With high-level languages, variables are no longer just chunks of arbitrary bits; each variable now has a specific data type, like char, boolean, int or double.

Week 11

User-Defined Types

Each high-level language, from FORTRAN and COBOL in the 1950s, to Java and C++ today, comes equipped with a pre-defined, built-in set of data types. In C++, these are the primitive types like int, float and char. These types are defined in the language itself, and not as part of the standard library.

However, programmers want more. Financial programmers want real numbers, scientific programmers can't live without imaginary numbers, business programmers want dates and times, while graphics programmers really need points and shapes. A Rube Goldberg invention.

Rather than adding all of these extensions as built-in types, creating complex, "Rube-Goldberg-like languages", designers found it better to give programmers the ability to define their own data types. That's really at the heart of modern, object-based programming.

Week 11

Object-Based Programming

Bjarne Stroustrup, the inventor of C++ explains that C++ is not an Object-Oriented language in the spirit of Smalltalk or Simula. C++ is a multi-paradigm language; a language that supports different styles of programming, but doesn't require you to subscribe to any particular orthodoxy. Bjarne Stroustrup at his desk.

Here's what Stroustrup has to say:

Languages such as Ada, Clu, and C++ allow users to define types that behave in (nearly) the same way as the built-in types. Such a type is often called an abstract data type, but I prefer the term user-defined type. The programming paradigm supported by user-defined types (often called object-based programming) can be summarized as:

Decide which types you want; provide a set of operations for each type.

Arithmetic types such as rational and complex numbers, as well as simple concepts like dates, times, pairs, points, lines, colors, bcd characters, error messages and currencies are all excellent candidates for user-defined types, as opposed to representing them as plain data structures or as part of a larger hierarchy.

Week 11

Time as a Structure

A picture of a clock face.

At the beginning of the semester (H01), you wrote a program to add and subtract time. This was harder than expected, because you didn't have a Time type; you did everything with integers. Let's rectify that now by creating a Time structure, with hours and minutes. Assume a 24-hour clock, so you don't need an indicator for AM/PM.

struct Time
{
    int hours;
    int minutes;
};

Now, you can create a Time object that bundles that data:

int main()
{
    Time lunch = = {11, 15};
    cout << lunch.hours << ":" << lunch.minutes << endl;

    return 0;
}
Week 11

Type Invariants

The definition of Time is straightforward, but, it will cause problems. There are certain restrictions on what values members of a Time object may and may not have. Given our specification, for instance, hours must be between 0..23 and minutes must be between 0..59. We call these the type's invariants.

But, with structures, we have no means of enforcing those restrictions. There is nothing to prevent someone, (most likely you, if you aren't careful), from constructing a bogus Time object like this:

Time bed_time = {27, 95};

Both values supplied here makes the Time object, bed_time, invalid. But, the code compiles fine; everything is perfectly legal C++, and the compiler has no idea that something bad might happen in the future.

Week 11

A Scenario

Peg & Cat

To give an idea of what could go wrong, suppose that you have a Radiation Therapy Machine like the Therac-25. If the software controlling the machine used a Time object to specify how long a therapy session should last, the machine would be intrinsically unsafe. Think about what will happen if you write the following code using the (non-buggy) run() function:

Time treatmentTime = {0, -2};
run(treatmentTime);
. . .
void run(Time& t)
{
    auto elapsed = t.hours * 360 + t.minutes * 60;
    while (elapsed > 0)
   { 
        pulseBeam();
        --elapsed;
   } 
}

As Peg and Cat point out, you now have a fairly serious problem. Even though the run() function is reasonable, it relies on the Time& t parameter being correctly initialized. Because minutes was, (accidentally), set to a negative number, the loop will supply not two minutes of radiation, but billions of pulses instead.

Week 11

The Problem

If you're lucky, the code will have some extra checking to catch this, and report an error. If you are unlucky and the code actually sends too much radiation to the patient, then they would die, just as in the original Therac-25 incident.

In other words, because the user of the Time structure set a single field to a nonsensical value, it's possible that your program could cause real injury. This is clearly unacceptable, and you will need to do something about it.

There are two problems with implementing Time as a struct.
  • Structures do not enforce invariants. Structures use "naked" variables to represent data, so any part of the program can modify those variables without any validation. Time expects certain relationships between its data members, but cannot enforce those relationships.
  • Time is represented in a particular way, as two int members. We say that code which uses the Time data type is tightly coupled to that implementation.

Both of these are real problems, and this is what C++ programming is like with raw structures. Code is brittle, bugs are more likely, and changes are more difficult. So, in the next lesson, let's change gears and represent Time in a slightly different way.