MongoDB: PHP 1 / Perl 0

We just made some tests with MongoDB. As we use PHP and Perl as main languages, we decided to benchmark both on a very simple case: inserting 10.000.000 entries.

The PHP script we use, mongo_pain.php:

< ?php

$m = new Mongo();
$db = $m->db_bench;
$collection = $db->tphp;

$num = 10000000;
while ($num-- > 0)
{
    $collection->insert(array('idfox'           => $num,
                      'idxxxxxx'        => 'outchagaloup',
                      'idxxxxxxref'     => 0,
                      'value'           => 'no_rice_without_price',
                      'datetime'        => 'newDate()',
                      'date'            => '2011-03-20',
                      'time'            => '17:00:00'));
}

exit;

?>

And the Perl version, mongo_pain.pl:

#!/usr/bin/perl

use 5.010;

use strict;
use warnings;

use MongoDB;

$MongoDB::BSON::utf8_flag_on = 0;

my $MONGO_HOST = 'localhost';
my $MONGO_PORT = 27017;

my $mongo = MongoDB::Connection->new(host => $MONGO_HOST, port => $MONGO_PORT);

my $mongo_database   = $mongo->db_bench;
my $mongo_collection = $mongo_database->tprl;

my $num = 10_000_000;
while ($num-- > 0)
{
    $mongo_collection->insert(
        {
            idfox       => $num,
            idxxxxxx    => 'outchagaloup',
            idxxxxxxref => 0,
            value       => 'no_rice_without_price',
            datetime    => 'newDate()',
            date        => '2011-03-20',
            time        => '17:00:00',
        });
}

We were very surprised by the results…

> time php mongo_pain.php
php mongo_pain.php  101,45s user 9,33s system 43% cpu 4:16,44 total

and

> time ./mongo_pain.pl
./mongo_pain.pl  335,99s user 100,80s system 95% cpu 7:37,88 total

Big difference!!!

During running, a top for each launch gives for PHP:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30055 root      40   0 31.3g 6.2g 6.2g S  100 39.4  11:38.90 mongod
14787 root      40   0  116m 9320 5884 R   57  0.1   0:43.74 php

and for Perl:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17405 root      40   0 47332  10m 2456 R  100  0.1   0:42.64 mongo_pain.pl
30055 root      40   0 33.3g 6.1g 6.1g S   76 39.0  14:16.58 mongod

We use PHP 5.3.5 and Perl 5.10.1 under a debian lenny on a xeon E5405 (8 core) with 16G of memory and SSD disk.

To try to understand, we profiled the perl script with the excellent Devel::NYTProf reducing the number of inserts from 10.000.000 to 1.000.000.

perl -d:NYTProf mongo_pain.pl && nytprofhtml

You can browse the results here.

Most of the time is spend in MongoDB::Collection::batch_insert, which calls MongoDB::write_insert and MongoDB::Connection::send. These functions are XS ones, too bad…

We have to investigate in C code…

Before doing that, we just stace‘d each execution. The fact that the perl client was at 100% of CPU while mongod stayed around 75 can tell that it don’t wait correctly for IOs.

strace -p $PID -ttt on PHP script execution:

1301070012.395152 sendto(3, "\320/!\n\322\7db_bench.tph"..., 208, 0, NULL, 0) = 208
1301070012.395205 time(NULL)            = 1301070012
1301070012.395239 time(NULL)            = 1301070012
1301070012.395270 sendto(3, "\320000!\n\322\7db_bench.tph"..., 208, 0, NULL, 0) = 208
1301070012.395324 time(NULL)            = 1301070012
1301070012.395357 time(NULL)            = 1301070012
1301070012.395388 sendto(3, "\320001!\n\322\7db_bench.tph"..., 208, 0, NULL, 0) = 208
1301070012.395442 time(NULL)            = 1301070012
1301070012.395475 time(NULL)            = 1301070012
...

and on perl one:

1301069922.956067 sendto(3, "\330\351\t\23\322\7db_bench.tpr"..., 216, 0, NULL, 0) = 216
1301069922.956184 time(NULL)            = 1301069922
1301069922.956230 sendto(3, "\330\352\t\23\322\7db_bench.tpr"..., 216, 0, NULL, 0) = 216
1301069922.956343 time(NULL)            = 1301069922
1301069922.956389 sendto(3, "\330\353\t\23\322\7db_bench.tpr"..., 216, 0, NULL, 0) = 216
1301069922.956502 time(NULL)            = 1301069922
...

Except the fact that PHP does an extra time() syscall for each loop and that the Perl sendto() syscall send 8 bytes more than PHP one, nothing differs… Looking into perl and PHP modules sources gives use the reason of the 8 extra bytes: our perl is 64 bits compiled, storing all integers as BSON_LONG (64 bits integers), while PHP is probably compiled with SIZEOF_LONG=4, storing all integers as BSON_INT (32 bits integers). As we store 2 integers (idfox and idxxxxxxref values), it adds 2 x 4 bytes for the perl side. Perhaps in the next version of MongoDB perl module (0.43) we will be able to choose how to store integers, see http://jira.mongodb.org/browse/PERL-127.

So next step, the hammer to kill the fly… Let’s try valgrind…

valgrind --tool=callgrind /usr/bin/perl mongo_pain.pl

See you next post…