Background: Overloading, Overriding, and Virtual Functions ----------------------------------------------------------- Overloading is the ability to use the same function NAME with different arguments to denote DIFFERENT functions. In C++, for instance, once can define: void add(int a, int b, int& c); void add(float a, float b, float& c); Overriding refers to the fact that an implementation of a method in a subclass supercedes the implementation of the same method in the base class. OVERRIDING OF NON-VIRTUAL FUNCTIONS IN C++ For instance: class B { ... public: void op1(int i) { /* B's implementation of op1 */ } ... } class A: public class B { ... public: void op1(int i) { /* A's implementation of op1 */ } ... } main() { B b; A a; int i = 5; b.op1(i); // B's implementation of op1 is used a.op1(i); // Although every A is a B, and hence B's implementation of // op1 is available to A, A's definition supercedes B's defn, // so we are using A's implementation of op1. ((B)a).op1(); // Now that a has been cast into a B, B's op1 applies. a.B::op1(); // Explicitly calling B's implementation of op1 } One thing to note in the above example is that the choice of B's or A's version of op1 to use is based on compile-time type of a variable or expression. The runtime type is not used. Thus, overriding of method definitions is based on compile-time type information. This is illustrated in the following function f, void f(B x, int i) { x.op1(i); } which may be invoked as follows: B b; A a; f(b, 1); // f uses B's op1 f(a, 1); // f still uses B's op1, not A's OVERRIDING IN THE PRESENCE OF VIRTUAL FUNCTION Use of virtual functions would enable overriding to be based on runtime type information: class B { ... public: virtual void op1(int i) { /* B's implementation of op1 */ } ... } class A: public class B { ... public: void op1(int i) {// op1 is virtual in base class, so it is virtual here too /* A's implementation of op1 */ } ... } main() { B b; A a; int i = 5; b.op1(i); // B's implementation of op1 is used a.op1(i); // A's implementation of op1 is used. ((B)a).op1(); // Although A has been cast into a B, dispatching of // virtual function is based on the runtime type info // associated with a. Thus, A's op1 is used. // NOTE difference with nonvirtual case a.B::op1(); // Explicitly calling B's implementation of op1. // Disable dynamic dispatching of op1 } Also, in the second example, things work as expected, based on runtime type info: void f(B x, int i) { x.op1(i); } which may be invoked as follows: B b; A a; f(b, 1); // f uses B's op1 f(a, 1); // f uses A's op1 Note that op1 denotes different funtions in the context of A and B. Thus, we are overloading the name "op1" with two different meanings. Reuse of client code, as enabled in OO languages, relies simply on this fact that the same function name will do the "right thing" depending on the runtime type. The right version can be determined using the argument types: If the object argument is of type A, then we use A's version of op1, and if it is of type B, then we use B's version of op1. With nonvirtual functions, the resolution of op1 is done at compile time, while with virtual functions, resolution is done at runtime. Implementation Aspects of OO-Languages --------------------------------------- Allocation of space for data members: The space for data members is laid out the same way it is done for structures in C or other languages. Specifically: -- the data members are allocated next to each other -- some padding may be required in between fields, if the underlying machine architecture requires primitive types to be aligned at certain adresses -- at runtime, there is no need to look up the name of a field and identify the corresponding offset into a structure; instead, we can statically translate field names into relative addresses, with respect to the beginning of the object. -- data members for a derived class immediately follow the data members of the base class -- multiple inheritance requires more complicated handling, we will not discuss it here Example: class B { int i; double d; char c; float f; } Layout of objects of type b: --------------- 0: | int i | // Assumption: Integer requires 4 bytes --------------- 4: | XXXXXXXXXXX | // pad, assuming double's are to be --------------- // aligned on 8-byte boundaries 8: | double d | | | // Assumption: Double requires 8 bytes --------------- 16: | char c|XXXXX| // Assumption: char needs 1 byte, 3 bytes are padded --------------- 20: | float f | // Assumption: float to be aligned on 4-byte boundaries, --------------- // and require 4-bytes of space. class C { int k, l; B b; } --------------- 0: | int k | --------------- 4: | int l | --------------- 8: | int i | --------------- 12: | XXXXXXXXXXX | --------------- 16: | double d | | | --------------- 24: | char c|XXXXX| --------------- 28: | float f | --------------- class D: public C { double x; } --------------- 0: | int k | --------------- 4: | int l | --------------- 8: | int i | --------------- 12: | XXXXXXXXXXX | --------------- 16: | double d | | | --------------- 24: | char c|XXXXX| --------------- 28: | float f | --------------- 32: | double x | | | --------------- Implementation of Virtual Functions ------------------------------------ Approach 1: Lookup type info at runtime, and then call the function defined by that type. Problem: very expensive, require type info to be maintained at runtime Approach 2: Treat function members like data members: Allocate storage for them within the object. Put a pointer to the function in this location, and translate calls to the function to make an indirection through this field. Benefit: No need to maintain type info at runtime. Implementation of virtual methods is fast, as it requires only a dereferencing of the field that stores the pointer to member funtion to be invoked. Problem: Potentially lot of space is wasted for each object, even though all objects of the same class have identical values for the table. Approach 3: Introduce additional indirection into approach 2: Store a pointer to a table in the object, and this table holds the actual pointers to virtual functions. Now we use only one word of storage in each object. class B { int i ; char c ; virtual void g(); virtual void h() ; } B b1, b2; b1: +-------------+ | i | |-------------| | c | |-------------| | VMT ptr ---|----------------+ +-------------+ | | | | Virtual Method Table (VMT) | for class B | +-------------+ +---------------->|ptr to B's g | | |-------------| | |ptr to B's h | b2: | |-------------| +-------------+ | | i | | |-------------| | | c | | |-------------| | | VMT ptr ---|----------------+ +-------------+ Impact of subtype principle on Implementation of OO-Languages ------------------------------------------------------------- The subtype principle requires that any piece of code that operates on an object of type B can work "as is" when given an object belonging to a subclass of B. This implies that runtime representation used for objects of a subtype A must be compatible with those for objects of the base type B. Note that the way the fields of an object are accessed at runtime is using an offset from the start address for the object. For instance, b1.i will be accessed using an expression of the form (we use C-like syntax here) *(&b1+0) where 0 is the offset corresponding to the field i. Similarly, the field b1.c will be accessed using the expression *(&b1+1) (This assumes that addresses are given in terms of words, and a single word of memory can store an integer variable. If we use byte addressing instead, and if all integers require 4 bytes, then the access would use an expression of the form *(&b1+4).) Similarly, an invocation of the virtual member function b1.h() will be implemented at runtime using an instruction of the form: call *(*(&b1+2)+1) where: &b1+2 gives the location where the VMT ptr is located *(&b1+2) gives the value of the VMT ptr, which corresponds to the location of the VMT table *(&b1+2) + 1 yields the location within the VMT table where the pointer to virtual function h is stored. (Note that the pointer to h is stored at offset 1 from the base of VMT.) Viewed in light of the way member fields and operations are accessed, the subtype principle imposes the following constraint: any field of an object of type B must be stored at the same offset from the base of any object that belongs to a subtype of B the VMT ptr must be present at the same offset from the base of any object of type B or one of its subclasses the location of virtual function pointers within the VMT should remain the same for all virtual functions of B across all subclasses of B. As a result, we must use the following layout for an object of type A defined as follows: class A: public B { float f; void h(); // Redefines h, but reuses implementation of G from B; virtual void k(); } A a; a's layout: Virtual Method Table (VMT) +-------------+ for class A | i | +-------------+ |-------------| /----->|ptr to B's g | | c | / |-------------| |-------------| / |ptr to A's h | | VMT ptr ---|-----------------/ |-------------| |-------------| | ptr to A's k| | float f | +-------------+ +-------------+ Note that in order to satisfy the constraint that VMT ptr appear at the same position in objects of type A and B, it is necessary for the data field f in A to appear after the VMT field. A couple of other points: a) non-virtual functions are statically dispatched, so they do not appear in the VMT table b) when a virtual function f is NOT redefined in a subclass, the VMT table for that class is initialized with an entry to the function f defined its superclass.