Endianness is always a problem when trying to write portable programs. For that reason there are functions implemented in endian.h that handles that. Normally, it is implemented using shifts and bit-wise or to get the bytes in the right order for the machine, something like this:
uint32_t be32toh(const void* ptr) { uint32_t val = *static_cast<const uint32_t*>(ptr); const uint32_t lower = (val >> 24) | ((val >> 8) & 0xFF00); const uint32_t upper = (val << 24) | ((val << 8) & 0xFF0000); return lower | upper; }
Here we assume that we read from some buffer where we have read bytes into from an external source that stores everything in big-endian format. (I picked big-endian format because I use an Intel, which is little-endian.) This deviate from the definition in endian.h on purpose, bear with me for a while.
There are other implementations, however, that read the bytes from memory directly and shift the bytes; something like this:
uint32_t be32toh(const void* buf) { const uint8_t *ptr = static_cast<const uint8_t>(buf); return (ptr[0] << 24) | (ptr[1] << 16) | (ptr[2] << 8) | ptr[3]; }
Noting that most modern architecture have plenty of registers and efficient instructions on registers, I wondered which one is fastest. Here are the results:
Program | Seconds | Percent |
---|---|---|
Register | 0.64804 | 49% |
Pointer | 1.32808 | 100% |
The results come from the following program (compiled with g++ -O4 -std=c++0x
).
Update: there were an error in the code leading to different loop count. I added a variable to contain the loop count instead and ran the measurements again.
#include <cstdlib> #include <sys/time.h> #include <sys/resource.h> #include <vector> #include <functional> #include <numeric> #include <iostream> #include <memory> #include <algorithm> #include <stdint.h> double operator-(rusage const& a, rusage const& b) { double result = (a.ru_utime.tv_usec - b.ru_utime.tv_usec) / 1.0e6; result += (a.ru_utime.tv_sec - b.ru_utime.tv_sec); return result; } template <class Func> double measure(Func func) { rusage before, after; getrusage(RUSAGE_SELF, &before); func(); getrusage(RUSAGE_SELF, &after); return (after - before); } uint32_t mk1_be32toh(const void* buf) { uint32_t val = *static_cast<const uint32_t*>(buf); const uint32_t lower = (val >> 24) | ((val >> 8) & 0xFF00); const uint32_t upper = (val << 24) | ((val << 8) & 0xFF0000); return lower | upper; } uint32_t mk2_be32toh(const void* buf) { const uint8_t *ptr = static_cast<const uint8_t*>(buf); return (ptr[0] << 24) | (ptr[1] << 16) | (ptr[2] << 8) | ptr[3]; } int main() { std::vector<uint32_t> array; size_t sum = 0; double result; const int loop_count = 100000; for (int i = 0 ; i < 10000 ; ++i) array.push_back(random()); result = measure([&array, &sum]() { for (unsigned int n = 0 ; n < loop_count ; ++n) for (unsigned int i = 0 ; i < array.size() ; ++i) sum += mk1_be32toh(&array[i]); }); std::cout << "mk1 exec time is: " << result << "(sum is " << sum << ")" << std::endl; result = measure([&array, &sum]() { for (unsigned int n = 0 ; n < loop_count ; ++n) for (unsigned int i = 0 ; i < array.size() ; ++i) sum += mk2_be32toh(&array[i]); }); std::cout << "mk2 exec time is: " << result << "(sum is " << sum << ")" << std::endl; }