Endianness is always a problem when trying to write portable programs. For that reason there are functions implemented in endian.h that handles that. Normally, it is implemented using shifts and bit-wise or to get the bytes in the right order for the machine, something like this:
1 2 3 4 5 6 | uint32_t be32toh( const void * ptr) { uint32_t val = * static_cast < const uint32_t*>(ptr); const uint32_t lower = (val >> 24) | ((val >> 8) & 0xFF00); const uint32_t upper = (val << 24) | ((val << 8) & 0xFF0000); return lower | upper; } |
Here we assume that we read from some buffer where we have read bytes into from an external source that stores everything in big-endian format. (I picked big-endian format because I use an Intel, which is little-endian.) This deviate from the definition in endian.h on purpose, bear with me for a while.
There are other implementations, however, that read the bytes from memory directly and shift the bytes; something like this:
1 2 3 4 | uint32_t be32toh( const void * buf) { const uint8_t *ptr = static_cast < const uint8_t>(buf); return (ptr[0] << 24) | (ptr[1] << 16) | (ptr[2] << 8) | ptr[3]; } |
Noting that most modern architecture have plenty of registers and efficient instructions on registers, I wondered which one is fastest. Here are the results:
Program | Seconds | Percent |
---|---|---|
Register | 0.64804 | 49% |
Pointer | 1.32808 | 100% |
The results come from the following program (compiled with g++ -O4 -std=c++0x
).
Update: there were an error in the code leading to different loop count. I added a variable to contain the loop count instead and ran the measurements again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | #include <cstdlib> #include <sys/time.h> #include <sys/resource.h> #include <vector> #include <functional> #include <numeric> #include <iostream> #include <memory> #include <algorithm> #include <stdint.h> double operator-(rusage const & a, rusage const & b) { double result = (a.ru_utime.tv_usec - b.ru_utime.tv_usec) / 1.0e6; result += (a.ru_utime.tv_sec - b.ru_utime.tv_sec); return result; } template < class Func> double measure(Func func) { rusage before, after; getrusage(RUSAGE_SELF, &before); func(); getrusage(RUSAGE_SELF, &after); return (after - before); } uint32_t mk1_be32toh( const void * buf) { uint32_t val = * static_cast < const uint32_t*>(buf); const uint32_t lower = (val >> 24) | ((val >> 8) & 0xFF00); const uint32_t upper = (val << 24) | ((val << 8) & 0xFF0000); return lower | upper; } uint32_t mk2_be32toh( const void * buf) { const uint8_t *ptr = static_cast < const uint8_t*>(buf); return (ptr[0] << 24) | (ptr[1] << 16) | (ptr[2] << 8) | ptr[3]; } int main() { std::vector<uint32_t> array; size_t sum = 0; double result; const int loop_count = 100000; for ( int i = 0 ; i < 10000 ; ++i) array.push_back(random()); result = measure([&array, &sum]() { for (unsigned int n = 0 ; n < loop_count ; ++n) for (unsigned int i = 0 ; i < array.size() ; ++i) sum += mk1_be32toh(&array[i]); }); std::cout << "mk1 exec time is: " << result << "(sum is " << sum << ")" << std::endl; result = measure([&array, &sum]() { for (unsigned int n = 0 ; n < loop_count ; ++n) for (unsigned int i = 0 ; i < array.size() ; ++i) sum += mk2_be32toh(&array[i]); }); std::cout << "mk2 exec time is: " << result << "(sum is " << sum << ")" << std::endl; } |