Endianness is always a problem when trying to write portable programs. For that reason there are functions implemented in endian.h that handles that. Normally, it is implemented using shifts and bit-wise or to get the bytes in the right order for the machine, something like this:
uint32_t be32toh(const void* ptr) {
uint32_t val = *static_cast<const uint32_t*>(ptr);
const uint32_t lower = (val >> 24) | ((val >> 8) & 0xFF00);
const uint32_t upper = (val << 24) | ((val << 8) & 0xFF0000);
return lower | upper;
}Here we assume that we read from some buffer where we have read bytes into from an external source that stores everything in big-endian format. (I picked big-endian format because I use an Intel, which is little-endian.) This deviate from the definition in endian.h on purpose, bear with me for a while.
There are other implementations, however, that read the bytes from memory directly and shift the bytes; something like this:
uint32_t be32toh(const void* buf) {
const uint8_t *ptr = static_cast<const uint8_t>(buf);
return (ptr[0] << 24) | (ptr[1] << 16) | (ptr[2] << 8) | ptr[3];
}Noting that most modern architecture have plenty of registers and efficient instructions on registers, I wondered which one is fastest. Here are the results:
| Program | Seconds | Percent |
|---|---|---|
| Register | 0.64804 | 49% |
| Pointer | 1.32808 | 100% |
The results come from the following program (compiled with g++ -O4 -std=c++0x).
Update: there were an error in the code leading to different loop count. I added a variable to contain the loop count instead and ran the measurements again.
#include <cstdlib>
#include <sys/time.h>
#include <sys/resource.h>
#include <vector>
#include <functional>
#include <numeric>
#include <iostream>
#include <memory>
#include <algorithm>
#include <stdint.h>
double operator-(rusage const& a, rusage const& b) {
double result = (a.ru_utime.tv_usec - b.ru_utime.tv_usec) / 1.0e6;
result += (a.ru_utime.tv_sec - b.ru_utime.tv_sec);
return result;
}
template <class Func>
double measure(Func func)
{
rusage before, after;
getrusage(RUSAGE_SELF, &before);
func();
getrusage(RUSAGE_SELF, &after);
return (after - before);
}
uint32_t mk1_be32toh(const void* buf) {
uint32_t val = *static_cast<const uint32_t*>(buf);
const uint32_t lower = (val >> 24) | ((val >> 8) & 0xFF00);
const uint32_t upper = (val << 24) | ((val << 8) & 0xFF0000);
return lower | upper;
}
uint32_t mk2_be32toh(const void* buf) {
const uint8_t *ptr = static_cast<const uint8_t*>(buf);
return (ptr[0] << 24) | (ptr[1] << 16) | (ptr[2] << 8) | ptr[3];
}
int main() {
std::vector<uint32_t> array;
size_t sum = 0;
double result;
const int loop_count = 100000;
for (int i = 0 ; i < 10000 ; ++i)
array.push_back(random());
result = measure([&array, &sum]() {
for (unsigned int n = 0 ; n < loop_count ; ++n)
for (unsigned int i = 0 ; i < array.size() ; ++i)
sum += mk1_be32toh(&array[i]);
});
std::cout << "mk1 exec time is: " << result
<< "(sum is " << sum << ")"
<< std::endl;
result = measure([&array, &sum]() {
for (unsigned int n = 0 ; n < loop_count ; ++n)
for (unsigned int i = 0 ; i < array.size() ; ++i)
sum += mk2_be32toh(&array[i]);
});
std::cout << "mk2 exec time is: " << result
<< "(sum is " << sum << ")"
<< std::endl;
}