Making a "map" of things

Computers are fast these days fast enough to have allowed scripting languages like php to be viable.

This allows for lazy programming, so I thought it might be interesting to revisit a problem I encountered many years ago.

The problem/solution is simplified here, so never mind that you don't see why it is solved this way, it make sense in the actual context.

We have a stream of queries keeping count of events to a database. The queries are all of the form:

UPDATE table SET total=total+1 WHERE key = blah

This stream is too much for the database to cope with.

So instead of loads of queries we make a string map of the queries adding 1 to a total against the query when we see a duplicate. So the query becomes:

UPDATE table SET total=total+__TOTAL__ WHERE key = blah

Every so often we iterate the map, substitute __TOTAL__ and start again.

In practice this reduced load to about 10% of what it was.

So lets program this naively using std::map 

// gcc -O ordered_map.cpp -lstdc++
#include <stdlib.h>
#include <stdio.h>
#include <map>
#include <string>
#include <iterator>
#include "common.h"

using namespace std;

int main(int args, char **argc)
{
int count;
int entries;
for (int tests = 0; tests < TESTRUNS; tests++)
{
map<std::string, int> UpdateMap;
for (int i = 0; i < UPDATES; i++)
{
string sql = GetTestString();

if (UpdateMap.find(sql) == UpdateMap.end())
{
UpdateMap.insert(std::make_pair(sql, 1));
}
else
{
map<string, int>::iterator it = UpdateMap.find(sql);
it->second += 1;
}
}
map<string, int>::iterator it = UpdateMap.begin();
while (it != UpdateMap.end())
{
count += it->second;
entries += 1;
it++;
}
}
printf("Total %d %d\n", count, entries);
}

Note that for all these tests we will use default optimisation.

When I do time a.out I get:

Total 10000000 1000956

real    0m3.700s
user    0m3.696s
sys     0m0.000s


But plainly it's poor code,let's start with the fact that I have used map which created an ordered map for a problem that has no need of being in any order.


//gcc -O unordered_map.cpp -lstdc++
#include <stdlib.h>
#include <stdio.h>
#include <unordered_map>
#include <string>
#include <iterator>
#include "common.h"

using namespace std;

int main(int args, char **argc)
{
int count =0;
int entries =0;
for (int tests = 0; tests < TESTRUNS; tests++)
{
unordered_map<std::string, int> UpdateMap;

for (int i = 0; i < UPDATES; i++)
{

string sql = GetTestString();

if (UpdateMap.find(sql) == UpdateMap.end())
{
UpdateMap.insert(std::make_pair(sql, 1));
}
else
{
unordered_map<string, int>::iterator it = UpdateMap.find(sql);
it->second += 1;
}
}
unordered_map<string, int>::iterator it = UpdateMap.begin();
while (it != UpdateMap.end())
{
count += it->second;
entries +=1;
it++;
}
}
printf("Total %d %d\n", count,entries);
}

When I do time a.out I get:

Total 10000000 1000956

real    0m1.932s
user    0m1.926s
sys     0m0.004s

We have halved the time. But it's still bad code. Lets just remove the sloppy coding, we are doing two finds when we only need one and we are recreating a map that only really needs clearing before reuse.


//gcc -O better_unordered_map.cpp -lstdc++
#include <stdlib.h>
#include <stdio.h>
#include <unordered_map>
#include <string>
#include <iterator>
#include "common.h"

using namespace std;

int main(int args, char **argc)
{
int count =0;
int entries =0;
unordered_map<std::string, int> UpdateMap;

for (int tests = 0; tests < TESTRUNS; tests++)
{
for (int i = 0; i < UPDATES; i++)
{

string sql = GetTestString();
unordered_map<string, int>::iterator it = UpdateMap.find(sql);
if (it == UpdateMap.end())
{
UpdateMap.insert(std::make_pair(sql, 1));
}
else
{
it->second += 1;
}
}
unordered_map<string, int>::iterator it = UpdateMap.begin();
while (it != UpdateMap.end())
{
count += it->second;
entries +=1;
it++;
}
UpdateMap.clear();
}
printf("Total %d %d\n", count,entries);
}


When I do time a.out I now get:

Total 10000000 1000956

real    0m1.596s
user    0m1.595s
sys     0m0.000s



Now lets take a custom approach.

//gcc best_unordered_map.cpp -lstdc++
#include <stdlib.h>
#include <stdio.h>
#include <cstring>
#include "common.h"
#include <stdint.h>
#include <functional>
using namespace std;

static inline uint32_t murmur_32_scramble(uint32_t k) {
k *= 0xcc9e2d51;
k = (k << 15) | (k >> 17);
k *= 0x1b873593;
return k;
}
uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed)
{
uint32_t h = seed;
uint32_t k;
/* Read in groups of 4. */
for (size_t i = len >> 2; i; i--) {
// Here is a source of differing results across endiannesses.
// A swap here has no effects on hash properties though.
memcpy(&k, key, sizeof(uint32_t));
key += sizeof(uint32_t);
h ^= murmur_32_scramble(k);
h = (h << 13) | (h >> 19);
h = h * 5 + 0xe6546b64;
}
/* Read the rest. */
k = 0;
for (size_t i = len & 3; i; i--) {
k <<= 8;
k |= key[i - 1];
}
// A swap is *not* necessary here because the preceding loop already
// places the low bytes in the low places according to whatever endianness
// we use. Swaps only apply when the memory is copied in a chunk.
h ^= murmur_32_scramble(k);
/* Finalize. */
h ^= len;
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}

struct custom_map
{
char sql[256];
int count;
};

#define SPARSITY (MAX * 9)
int main(int args, char **argc)
{
int count = 0;
int entries = 0;
//int collisions = 0;

struct custom_map map_entry[SPARSITY];

for (int i = 0; i < SPARSITY; i++)
{
map_entry[i].sql[0] = 0;
map_entry[i].count = 1;
}

for (int tests = 0; tests < TESTRUNS; tests++)
{
for (int i = 0; i < UPDATES; i++)
{

char *sql = GetTestString();
int index = murmur3_32((uint8_t *) sql,strlen(sql),0) % SPARSITY;

do
{
if (strcmp(map_entry[index].sql, sql) == 0)
{
map_entry[index].count++;
break;
}
else if (map_entry[index].sql[0] == 0)
{
strcpy(map_entry[index].sql, sql);
break;
}
else
{
//collisions++;
index = (index+1) % SPARSITY;
}
} while (1);
}

for (int i = 0; i < SPARSITY; i++)
{
if (map_entry[i].sql[0] != 0)
{
count +=map_entry[i].count;
entries += 1;
map_entry[i].sql[0] = 0;
map_entry[i].count = 1;
}
}
}
printf("Total %d %d\n", count, entries);
}



Total 10000000 1000956

real    0m1.202s
user    0m1.201s
sys     0m0.000s

This is 30% faster again. Though notably my first version of this was practically the same as the std::unordered_map. It was surprising but it lead me to the discovery that the hash function used in std::unordered_map cuts down in iteration by eating the string to be hashed in 4 byte chunks, hence the copycat use of a function that operates the same way.

Why is our custom code faster? simply because it needs to do less. A normal mapping function will allow deletion and std::unordered_map would be typically implemented that the hash points not to a value but to a "bucket" holding an array of the values. Deleting is then just removing a single value. I simply use an array making deletion very impractical.

By sacrificing an ability we can specialise for speed.The other ability sacrificed here is expansion. We are presuming sizes to avoid memory allocations.

For systems that need extreme reliability that can be a pretty good thing, but that's a topic for another day.

Is 30% more speed worth it? Is being 3x faster than the original code using an ordered map worth it? It all depends on your criteria. 99% of the time I'd probably use std::unordered_map and only change if a profile of the code gave me a reason to.

 

   




Comments

Popular posts from this blog

A reliable tuya authentication in node-red

Arduino boards - what I wish I'd known when starting

node-red a full home automation example