Wednesday, May 03, 2006

The Alleged Virtual Function Overhead

There is a common fallacy to proclaim that there is a "virtual function call overhead" associated with calling a virtual function in C++. The fallacy is that you are comparing apples and oranges, that is: it is said that there is an overhead in calling a virtual function when compared with an ordinary function call. This is true, but that is because they serve different purposes: a virtual function can dispatch the call based on the actual type of the object, while for an ordinary (member) function call the given function that is called is based on the static type.

Instead, virtual function calls should be compared with similar constructions solving the same problem: namely calling a function containing a switch on the actual type of the object. Since the problem is actually related to executing different code depending on the dynamic "type" of an object, we hereafter call this dynamic dispatch regardless of whether it is in C or C++.

Setting the Scene

To illustrate the traditional solutions in C and C++, we assume that we wish to implement a simple virtual machine and demonstrate how it is done in a typical C manner and in a typical C++ manner. In both cases we use C++ syntax to write the code, so there are some minor differences compared to how it would look in standard C.

For simplicity, we assume that all instruction have one constant and manipulate the accumulator. The instruction we will implement are, in Z80-like notation:

Virtual machine instructions
Instruction Description
LD A, c Load accumulator with constant value c
ADD A, c Add constant c to value in accumulator
SUB A, c Substract value c from value in accumulator
We define a structure to represent a state as:
struct state {
    int acc;
};

Call + Switch

For the "C solution", we introduce a structure to represent an instruction and an enumeration with constants for the instructions:
enum enum_kind { LOAD_INSTR, ADD_INSTR, SUB_INSTR };

struct instr {
    enum_kind kind;
    int val;
};
The C code to implement dynamic dispatch then look like this:
void execute(instr* i, state* s) {
    switch (i->kind) {
        case LOAD_INSTR:
            s->acc = i->val;
            break;
        case ADD_INSTR:
            s->acc += i->val;
            break;
        case SUB_INSTR:
            s->acc -= i->val;
            break;
    }
}
We call this approach the call+switch method.
Advantages:
  • Less code to write (than the C++ solution)
  • Can be inlined if the execute() function is small.
Disadvantages:
  • Not in accordance with the Open-Closed Principle: the code cannot be extended with new instructions without changing existing code.
  • Slower than using virtual function calls [sic], unless the function is inlined

The function pointer structure

An alternative, and quite common, way to imlement virtual dispatch is to use a structure with function pointers, where the function pointer is set to the apropriate function. To handle this, we create the following structure:
struct instr {
    void (*execute)(instr*, state*);
    int val;
};

void load(instr* i, state* s) { s->acc = i->val; }
void add(instr* i, state* s) { s->acc += i->val; }
void sub(instr* i, state* s) { s->acc -= i->val; }
To perform the virtual dispatch, the following code is then used:
program[pc]->execute(program[pc], &state);
We call this the function pointer method.
Advantages:
  • In accordance with the Open-Closed Principle: the code is open for extension with new instruction but closed to changes since it doesn't require changing existing code.
Disadvantages:
  • More code to write
  • Larger structures: one function pointer is required for each method
  • Cannot be inlined

Virtual functions

The corresponding construction in C++ using virtual functions look something like this:
class Instruction {
protected:
    int m_val;
public:
    Instruction(int v = 0) : m_val(v) { }
    virtual void execute(state&) const = 0;
};

class Add : public Instruction {
public:
    Add(int v) : Instruction(v) { ; }
    virtual void execute(state& s) const { s.acc += m_val; }
};

class Sub : public Instruction {
public:
    Sub(int v) : Instruction(v) { ; }
    virtual void execute(state& s) const { s.acc -= m_val; }
};

class Load : public Instruction {
public:
    Load(int v) : Instruction(v) { ; }
    virtual void execute(state& s) const { s.acc = m_val; }
};
We call this approach the virtual call method.
Advantages:
  • In accordance with the Open-Closed Principle: the code can be extended with new instruction without changing existing code.
Disadvantage:
  • More code to write
  • Cannot be inlined

Measurements

All measurements done are on an Intel(R) Pentium(R) M processor 1.60GHz and using gcc 3.0 with -O3.

To measure the execution time of the two pieces of code, we create a random program of length length and execute it 20000 times. To avoid any randomization problems, we use the same sequence of instructions for both the call+switch, function pointer, and virtual call versions when we measure. In addition, the call+switch function is placed in a separate translation unit to prevent automatic inlining: this would be the normal case, since the function would be to large to inline.

Virtual dispatch (execution time)
length call+switch
(CS)
funcptr
(FP)
virtual
(VF)
CS/FP FP/VF
2000 1.08 0.82 1.49 1.31 0.55
4000 2.09 1.64 1.64 1.27 1.00
6000 3.12 2.50 2.46 1.25 1.01
8000 4.16 3.30 3.28 1.26 1.01
10000 5.22 4.13 4.09 1.26 1.01
12000 6.26 4.97 4.97 1.26 1.00
14000 7.31 5.84 5.79 1.25 1.01
16000 8.45 6.84 6.75 1.23 1.01
18000 9.52 7.59 7.55 1.26 1.00
20000 10.53 8.49 8.47 1.24 1.00

Summary

As can be seen in the table, the virtual call and function pointer methods takes approximately 80% of the call+switch method. Worth noting, however, is that if the call+switch function is inlined, the execution time is several factors faster than the other methods (in this specific example). So for small switches inside functions that are inlined, the call+switch method might significantly improve performance.

2 comments:

Snakefoot said...

Consider to include this in your test:

http://en.wikipedia.org/wiki/Curiously_recurring_template_pattern

Mats Kindahl said...

This do not change the fact that you need some sort of dynamic dispatch, so not sure how you wanted me to include it. Can you show an example of what you're thinking of?