To decide, I wrote a quick measure program and ran it, and the results can be seen in the diagram to the right.
Observe that this micro-benchmark measures the time to insert (and delete) N unique objects into a vector. It does not talk about searching the array. However, searching the array should not be that different in performance since it just involves an extra dereferencing. You should note that the width of the cache lines might affect the performance of the algorithms, so for objects that are narrower than the cache line have a bigger chance of landing fully in the cache line, which in turn will improve performance.
As you can see, there seems to be a break-even in the range 16-20
fields, so for any objects that have less than 16 fields (of
int
size), it seems to pay off to work with the actual
objects instead of pointers to objects.
The program allocates objects consisting of arrays of
int
. I picked int
since that is the natural
size for the processor (usually the size of the register), so using a
smaller granularity is bound to generate a lot of extra work for the
processor to no practical use for us. The use of an array might affect
performance since base objects are copied member-by-member, and
copying an array have to do the equivalent of a memcpy
,
while using base types as members of the structure might allow the
compiler to generate a single instruction for each field, effectively
inlining the member-by-member copying.
To get an object of the correct size (counted in number of
int
), I constructed a template class:
template <int N> struct obj { int x[N]; };After that it is easy to define the two classes to perform each operation as two classes. Each class has an
execute()
function to handle what is actually measured. You can see the full
source code below.
Push pointer | Push object |
---|---|
template <int N> class PushPtr { public: typedef obj<N> ObjType; typedef std::vector<ObjType> Vector; PushPtr(int c) : count(c) { my_list.reserve(c); } void execute() { for (int c = 0 ; c < count ; ++c) { ObjType *obj = new ObjType; my_list.push_back(obj); } typename Vector::iterator ii = my_list.begin(); for ( ; ii != my_list.end() ; ++ii) delete *ii; my_list.clear(); } private: int const count; Vector my_list; }; |
template <int N> class PushCpy { public: typedef obj<N> ObjType; typedef vector<ObjType> Vector; PushCpy(int c) : count(c) { my_list.reserve(c); } void execute() { for (int c = 0 ; c < count ; ++c) { ObjType obj; my_list.push_back(obj); } } private: int const count; Vector my_list; }; |
Measurements
Size | PushPtr | PushCpy | Size | PushPtr | PushCpy |
---|---|---|---|---|---|
1 | 0.156009 | 0.020002 | 21 | 0.160010 | 0.176011 |
2 | 0.104006 | 0.004000 | 22 | 0.164010 | 0.160010 |
3 | 0.100007 | 0.040002 | 23 | 0.160010 | 0.200013 |
4 | 0.104007 | 0.028001 | 24 | 0.168010 | 0.228014 |
5 | 0.100007 | 0.016001 | 25 | 0.164011 | 0.216013 |
6 | 0.100006 | 0.044003 | 26 | 0.164010 | 0.220014 |
7 | 0.104006 | 0.040003 | 27 | 0.168011 | 0.196012 |
8 | 0.104006 | 0.064004 | 28 | 0.168010 | 0.168011 |
9 | 0.100006 | 0.056004 | 29 | 0.172011 | 0.204012 |
10 | 0.100006 | 0.060004 | 30 | 0.160010 | 0.180012 |
11 | 0.104006 | 0.076005 | 31 | 0.164010 | 0.264016 |
12 | 0.100006 | 0.076005 | 32 | 0.168011 | 0.216013 |
13 | 0.100006 | 0.088006 | 33 | 0.168011 | 0.216013 |
14 | 0.100006 | 0.080005 | 34 | 0.164011 | 0.264016 |
15 | 0.104007 | 0.092005 | 35 | 0.164010 | 0.252016 |
16 | 0.104007 | 0.140009 | 36 | 0.160010 | 0.228014 |
17 | 0.096006 | 0.132008 | 37 | 0.168011 | 0.252016 |
18 | 0.160010 | 0.128008 | 38 | 0.160010 | 0.292018 |
19 | 0.164010 | 0.116007 | 39 | 0.164010 | 0.284018 |
20 | 0.168011 | 0.176011 | 40 | 0.164010 | 0.300019 |
Source Code: push_pointer_or_copy.cc
#include <cstdlib> #include <sys/time.h> #include <sys/resource.h> #include <vector> // Here's an object to push, we're using ints since that is usually // the natural type for storing in a register template <int N> struct obj { int x[N]; }; template <int N> class PushPtr { public: typedef obj<N> ObjType; typedef std::vector<ObjType*> Vector; PushPtr(int c) : count(c) { my_list.reserve(c); } void execute() { // This is the actual code to create a new object and push the // pointer into the container. for (int c = 0 ; c < count ; ++c) { ObjType *obj = new ObjType; my_list.push_back(obj); } // We need to delete all the allocated objects as part of the // measurement to get a comparable figure. typename Vector::iterator ii = my_list.begin(); for ( ; ii != my_list.end() ; ++ii) delete *ii; my_list.clear(); } private: int const count; Vector my_list; }; template <int N> class PushCpy { public: typedef obj<N> ObjType; typedef std::vector<ObjType> Vector; PushCpy(int c) : count(c) { my_list.reserve(c); } void execute() { // This is the actual code to create an object and push a copy of // it into the container. for (int c = 0 ; c < count ; ++c) { ObjType obj; my_list.push_back(obj); } } private: int const count; Vector my_list; }; // Define an substraction operator to make it easy to compute the // difference. double operator-(rusage const& a, rusage const& b) { double result = (a.ru_utime.tv_usec - b.ru_utime.tv_usec) / 1.0e6; result += (a.ru_utime.tv_sec - b.ru_utime.tv_sec); return result; } // Function to measure 'count' executions of the function doing N // pushes onto the end of the container. template <int N, template <int> class Func> double measure(const int count, const int laps) { Func<N> func(count); rusage before, after; getrusage(RUSAGE_SELF, &before); for (int lap = 0 ; lap < laps ; ++lap) func.execute(); getrusage(RUSAGE_SELF, &after); return after - before; } // Template class that will measure and print the result objects in // the range [M,M+N], i.e., objects of size M, M+1, ..., M+N. template <int N, int M> struct Measurer; template <int N> struct Measurer<1,N> { static void measure_and_print(const int count, const int laps) { double push_ptr = measure<N,PushPtr>(count, laps); double push_cpy = measure<N,PushCpy>(count, laps); printf("%10d %10d %10f %10f %10f\n", N, count, push_ptr, push_cpy, push_cpy / push_ptr ); } }; template <int N, int M> struct Measurer { typedef Measurer<1,M> First; typedef Measurer<N-1, M+1> Rest; static void measure_and_print(const int count, const int laps) { First::measure_and_print(count,laps); Rest::measure_and_print(count,laps); } }; int main() { int const count = 100; int const laps = 20000; printf("%10s %10s %10s %10s %10s\n", "Size", "Count", "PushPtr", "PushCpy", "PC/PP"); Measurer<40,1>::measure_and_print(count,laps); }
2 comments:
Hello Mats,
Great post, thanks!
I've just run this on MS VC++ 2008, and it confirmed my assumptions that heap allocation can be more expensive than copying data, especially wrt deep tangles of references across multiple threads.
I think you've missed a vital detail in your benchmark app, though: PushPtr clears its vector at the end of execute(), but PushCpy doesn't. That means PushCpy's vector will grow unboundedly across all tests and repeated reallocation will occur after the first run.
Adding my_list.clear() at the end of PushCpy::execute() makes copying look even more favorable.
Thanks!
- Kim
With that change in place, and experimenting with a larger span of data clump sizes (1-500 ints), the results comes out as what looks like random noise. Copies are consistently faster, but by a varying factor (50-75%).
Hmm.
Post a Comment