the first weeks of mongodb @ijenko

In short, we need to be able to query tens of millions of rows of “real time data”. The system must be also easilly scalable since the previous requirement increases hand in hand with the amount of end users.

Since MongoDB seems to comply with all our needs, we have now spent couple of weeks evaluating and studying it.

Our experience with MongoDB has been so far very positive.

We have been testing MongoDB 1.8.0 in two different environments. The first, a quite powerful testing server and the second, a “lab-cluster” build from tiny Proxmox -instances.

 

On a single server

In our single server setup we had Mongo running on a server having Xeon E5405 (8 cores) with 16G of memory and RAID-1 SSD disk (OS Debian Lenny). The goal of these first tests were for us to learn the basic usage of MongoDB and to check the theorethical limits of Mongo on a server where we had already some experience on running other databases.  We were also checking out some Mongo’s client libraries.

We were amazed how quikly and painlesly we managed to get everything up and running. At the end of the day we had some overal and simple “benchmark results”.

To put it short, our results were (with PHP client):

Insert
insert without index: 54K inserts/sec
insert with 1 index: 37K inserts/sec
insert with 2 index: 28K inserts/sec

FindOne: based on a collection of 36 million documents, the search key is indexed
1 client 10K reads/sec
2 client 20K reads/sec

8 clients 57K reads/sec <– server in pain

Insert and find mixed
insert without index+concurrent find: 37K insert/sec write , 1 read/sec read
insert with 1 index+concurrent find: 22K insert/sec write , 7k read/sec read
insert with 2 index+concurrent find: 14K insert/sec write , 8k read/sec read

Managing 50K inserts/second is far more than we expected. Adding indexes and queries in parallel stills gave some interesting raw number.

 

The power of MongoDB – easy sharding and replica sets

On our second,  “lab-cluster” -test, we concentrated on learning more about the administration of Mongo, stability, backups, recovery and vertical scalability.

We set up an example configuration where we had a sharding cluster made of two replica sets (both sets having 1 master, 1 slave and 1 arbiter). During every test run we added a third shard to the cluster in start + 2h.

Three Configuration servers of Mongo were running on own dedicated nodes. Two Mongos routers were running on application servers.

We hacked a simple PHP application to insert randomly 50 to 2000 documents per second to the database. We kept the meta data of the inserts in a Redis instance. We used the meta data to verify if any inserts were lost when primary node of the replica set got killed, machine got completely jammed or network went down.

During a test run we initialized the database with 4 million documents. After the initialization, the PHP application was running for the next 6 hours. During the execution of the PHP script, in about every 30 minutes a master or a slave of a replica set was “kill -9″ and recovered 15 minutes later. A separate PHP script to query the database was also running, creating an average of 500 -”range” queries per second.

The results in short

After 3 times executing the tests we had 0 missing inserts when running the clients on “safe mode”. When safe mode was not on and an average 1000 inserts where done while primary of the replica set was shot down, an average of 13 inserts were lost. All administrative tasks, such as adding more shards and taking back-ups were easy and did not require any down time. 10 points!

 

A bit more detailed

MongoDB promises: Automatic balancing for changes in load and data distribution, Easy addition of new machines, Scaling out to one thousand nodes, No single points of failure, Automatic failover.

So far it has been fulfilling its promises.

Automatic balancing for changes in load and data distribution

During the tests all the shards seemed to be equally loaded. However we did not take stats of this so we don’t have any hard data about it. Will done that next time. The data was nicely balanced on all the shards.

Easy addition of new machines

In our tests we added “shard-3″ to the set every time after the test had been running for 2 hours. It was easy and did not need or cause any downtime. We did not experience problems with this during any of the test runs. However in our tests it took sometimes 2 hours before the third shard started to get populated. We are not experienced enough to explain why this happened.

Scaling out to one thousand nodes

Cannot comment since we were running this now only on three nodes. But our baby steps are looking promising.

No single points of failure

Setting up sharding with replica sets was an easy task. Also administration of them was more simply than we expected.

We manage to take database dumps and backups of all the services without downtime. Recovery was simple as well. Also a complete wipe out of a config server was not a problem as long as there was a backup in hand. If you don’t have a backups from the config servers, it seems that you are in trouble.

We even tried to just wipe out one of the config servers, set it up again and just copy a Db of another config server and started it with a “–repair” -option. All seemed to work without any problems. Somehow I have a feeling it’s not supposed to be done like that.

Automatic failover

Nothing to complain about this point either. The automatic failover in replica sets worked like a charm every single time.

 

Final words after the “first step”

As you might have noticed, so far we have been very happy of our experience with MongoDB. It seems that it is fulfilling all our requirements. However a word of caution is needed: we are still in honeymoon with Mongo. Everything worked so nicely and without problems that we might have been blinded by some obvious issues.

There would be a lot to write more about the backups, recovery of the data, admin tools and available debug information.  We will write more about those when we have more experience on using them with some “real life” use cases and longer test runs.

Leave a Reply