9/03/2014

Inner product, round 4: refactoring into object

One of the benefits of moving from C to C++ is your enhanced capability to describe things in ways that are cognitively less taxing. Entering round 4, the only problem in the backlog is P5:

P5. Representation of vectors: Vector should remember its own dimension; use of vector should follow the convention in mathematics.

So, let's work on P5. To do this we need to use the class construct: C++'s way of letting you wrap scattered things into one cohesive piece. Before that, it is important that we understand what kind of an impact the new representation brings to the working program obtained at the end of round 3. Assuming that we have vector objects instead of array of doubles, which code must be changed?  

I hope you are not surprised by the answer: all. Our program is only about vectors and computing inner product of two vectors, so all code is related to the representation and manipulation of vectors with arrays of doubles. But answering all does not help much. Should you just throw away the code of round 3 and start over from scratch?  Certainly not. The code works, and it has been unit-tested. You need to re-use the code as much as possible. With this in mind, let us begin the next round.

Step U. Understanding the problem.


Let's try to introduce some structure into the problem even if we know all code is potentially subject to change. The structure will allow us to come up with a list of tasks which we can choose to complete with good rationale. Let's draw a figure:



In the figure, a small cloud represents the object we want to introduce to the program to replace a certain existing concept. In the present case, we plan to introduce Vector objects to replace arrays of doubles for representing mathematical vectors. In the diagram, a rectangle represents a function or the unit tests for a function. An arrow from rectangle A to rectangle B means that A uses or depends on B. For example, the function main depends on the function computeInnerProduct because the former calls the latter. In the same manner, an arrow is drawn from rectangle to the small cloud if the former is affected by the introduction of the latter. In short, they are the units that are directly impacted by the introduction of the new class. For example, the function extractVectorFromString is affected by the introduction of vector since it needs to return a vector instead of an array of doubles.

With the figure, we quickly grasp a picture of the impact in replacing arrays of doubles with vector objects. The figure gives a crisp picture of our answer of "all": All units need to change. 

Step D. Devising a plan.

In what order should we go about the changes? Which unit should we start with? In answer these questions, let us remind ourselves that we are always interested in maintaining a working program. Let us apply that to the ordering of changing the units. So we will change one unit at a time. 

The following list of tasks is a plausible decomposition based on the "uses" or "depends" relationship.

T0. Learn class construct, constructors, destructors, and other member functions.
T1. Code up the vector class.
T2. Test vector.
T3. Change the function inner product.
T4. Change the unit test of inner product.
T5. Change extract vector from string.
T6. Change the unit test of extracting vector from string.
T7. Change get vector from user.
T8. Change main.
T9. Test main by running the program.
T10. Clean up in the reverse order of changes. 

A long list, but definitely doable. A note about T10. When making changes, there is a tendency for us programmers to keep the old code, on the ground that the old code might still be useful. My suggestion: Don't. Even if you know you need the old code, there are better ways to do it. We will come back to this later.  


Step C. Carrying out the plan. 

Let's look at T1 and T2 in detail. Further, let's do T2 first:
 TEST (Vector, create)
{
    double a[2]={1,2};
    Vector v(a, 2);

    LONGS_EQUAL(2,v.dim());
    DOUBLES_EQUAL(1,v.component(1),0.001);
    DOUBLES_EQUAL(2,v.component(2),0.001);
}
 

Compiling the test project would fail because we have yet to code up the vector class. But the test is a good thing because it tell us what we expect from the class:
  • creating a vector
  • obtaining its dimension
  • obtaining the component by dimension.
Here, we have decided that the new class will be called Vector. Note that we follow the convention of beginning a self-defined class with a capital letter. Moreover, the dimension and component given index will be obtained through the corresponding calls to member functions dim and component.

Let's tackle T1, writing the Vector class. The C++ class construct is a mechanism for you to create an abstraction that involves both data and operations. What are the data and operations we have been using around the concept of vector?


The main function of round 3.

Checking the main program of round 3, we found the following:
  • dimension of vectors (line 7)
  • pointer to array of doubles (line 8)
  • creation of vectors (lines 14 and 15; code not shown)
  • deletion of vectors (lines 22 and 23)
  • access of vector components (line 7 in computeInnerProduct; not shown here)
Diagrammatically, we have a set of associated data and operations as depicted below:



 Since we are going to share the class Vector between the production project and the test project, we will follow the convention to separate its code in two files: Vector.h and Vector.cpp. Here is the header file:

#ifndef VECTOR_H_INCLUDED
#define VECTOR_H_INCLUDED
class Vector {
public:
    Vector (double a[], int d);
    Vector (Vector const & v);
    int dim() const;
    double component(int i) const;
    ~Vector();
private:
    double * coms;
    int d;
};

#endif // VECTOR_H_INCLUDED


Below "public:" we have five operations: two constructors for creating instances (or objects) of Vector, member functions dim() and component(int) for getting the dimension and component of vector given the dimension, and a special member function called the destructor. These operations are public so that they can be called from outside of the class (e.g., v.component(1) in the test case). 

Below "private:" we have the data that make up the vector: the components and the dimension. They are private so that they direct access of these data members are flagged as compilation errors (e.g., v.dim = 3.)

Diagrammatically, the class Vector is represented as below.

The compartments from top to bottom are name of the class, data members (adorned with '-' for private) and member functions (adorned with '+' for public). As a convention, we won't include the constructors and the destructor, and we don't show the private member functions.  

What's a good order to do T3 and T4? Let's begin with T4. In T4, we redo test computeInnerProduct for vector version; it should pass when T3 is done. By writing tests first, we are actually thinking about what kind of an interface the new method should have. In other words, we are designing the interface of the new function computeInnerProduct:

TEST (computeInnerProduct, vector)
{
    double u[2] ={1,2};
    double v[2] ={1,1};
    vector w(u,2);
    vector x(v,2);


    DOUBLES_EQUAL(3, computeInnerProduct(w,x),0.001);

}

When T4 is done, we have code like this:

double computeInnerProduct (double v1[], double v2[], int d1, int d2) {
...
}

double computeInnerProduct (vector v1, vector v2)
{
    if (v1.dim() != v2.dim())
        throw "Vectors of different dimension!";

    double r=0;
    for (int i=1; i<=v1.dim(); ++i) {
        r += v1.component(i)*v2.component(i);
    }
    return r;
}

 

At this point, we have both the old and new versions of computeInnerProduct. We are going to keep the old version around for a while because we are have yet to change the main function (T8). If the old version is removed, the compilation of both the production project and the test project would certainly fail. The old and new versions can coexist, thanks to the C++ feature of allowing a function name to be overloaded: Two functions can have the same name if their arguments are different in type.

The tasks of T5 and T6 are done in the same way. Note that the use of vector class eliminates the argument int &dim:

  double *  const extractVectorFromString(std::string line, int & dim)

  vector extractVectorFromString(std::string line)
    
The old version needs to pass the dimension back to the main function, but since the return value of the function is taken up by the allocated array, the dimension is made an additional argument and is a call by reference is used. In contrast, the new version now keeps the dimension inside the vector and has a cleaner interface.

T10 is a task of its own because we want to be careful when removing things: One unit at a time. For example, we can begin by removing the old version of computeInnerProduct. The product project should compile and execute correctly, but test project would fail to compile since the some existing tests depend on old computeInnerProduct. These tests are removed.   

Here is the working program.

Step L. Looking back.

Is the program easy to understand? Does it have more "vector-ness" than "array-ness"? Let's look at the code:



The only places we use vectors are:
  • creating vectors (lines 12 and 13)
  • passing vectors to computeInnerProduct (line 14)
Carefully compare code in the main function with its predecessor: There's no longer "array-ness" in the code, and the use of Vector objects are concise and easy to understand.

You should do the review for the other functions as well to convince yourself of the improvement.

In this round, we have changed the representation of vectors without changing the functionality of the program. The systematic way we did this called refactoring.

Is  the program good enough? I'll leave it to you to figure out and improve with yet another round. It's time to move on.

© Y C Cheng, 2013, 2014. All rights reserved.

No comments:

Post a Comment