php links

 + benchmark en ligne par extension d’url
 + marqueur code
 + loadtest avec report (necessite nodejs)
 install = ajout d’une inclusion (pourrait etre automatisé par le bootstrap en test)
 event driven for php
implementation protocol bufferhttps://github.com/allegro/php-protobuf
php’s make: http://www.phing.info/
      php 5.4.8
     cloudsql = mysql5.5 (client/lib compatible)
     cloudstorage = file_get_contents(“gs://bucket/filename.txt”)
     faut faire avec les extensions dispo
dashboard frameworkhttp://razorflow.com/

Mojolicous, render_later and weaken transactions

When using render_later method, did you ever encountered this Mojolicious error?

Can't call method "res" on an undefined value at /home/max/perl5/lib/perl5/Mojolicious/Controller.pm line 275.

Read the rest of this entry »

unload #week #12

unload #week #11

DarkPAN over *anything_you_want* using cpanminus

Yesterday, I talked about a quick hack to add the SSH protocol using the scp URI scheme feature to cpanminus. At the end of the article, I added a technical note to explain how I saw the things to handle more protocols.

I just tried to do it.

With this version, cpanm can now handle scp:// and sftp:// URI schemes.

For scp://, it first tries to use the scp command then curl (hoping that curl is linked with libssh2).

For sftp://, it first tries to use the sftp command then curl (still hoping that curl is linked with libssh2).

It seems to work well.

I added the ability to develop external URI scheme backends. Imagine you want to add a new URI scheme, say foo://, without modifying App::cpanminus package.

To achieve this goal, create a module in the namespace App::cpanminus::backend::foo that will return a hash ref with two keys: get and mirror.

For each key, you have to provide an anonymous function to handle the action, like this:

package App::cpanminus::backend::foo;

{
    get => sub
    {
        my($self, $uri) = @_;
        # retrieve the content of the resource pointed by $uri (foo://...)
        # then return it
        return $content;
    },
    mirror => sub
    {
        my($self, $uri, $path) = @_;
        # retrieve the content of the resource pointed by $uri (foo://...)
        # and copy it to the file $path
        return $wathever_you_want;
    },
};
__END__

That’s all…

UPDATE 2011/10/2: As Paweł Murias suggested in comments below, I changed the “API” to a more standard one:

package App::cpanminus::backend::foo;

sub get
{
    my($self, $uri) = @_;
    # retrieve the content of the resource pointed by $uri (foo://...)
    # then return it
    return $content;
}

sub mirror
{
    my($self, $uri, $path) = @_;
    # retrieve the content of the resource pointed by $uri (foo://...)
    # and copy it to the file $path
    return $wathever_you_want;
}

1;
__END__

The hash ref old one in not available anymore.

DarkPAN over SSH using cpanminus

You want to set up a private CPAN, aka a DarkPAN, to put your own private modules in and deploy them as if they came from the official CPAN using cpanminus and its fantastic command cpanm.

The problem is that cpanminus works only with files, HTTP/HTTPS and FTP URIs. It is fine for all public uses, but for private ones, it is boring doing a setup of an HTTPS server with some authentication scheme, while SSH is so simple to use…

cpanm can use curl as backend and curl can handle scp/sftp if linked with libssh2. It’s OK, but too often in binary distros, curl is not linked with libssh2… and cpanm filters URIs based on the /^(ftp|https?|file):/ regexp… :( Too bad…

So as a quick hack and to avoid to depend on a specially compiled curl version, we modified cpanminus to make it support the scp protocol using the scp command. Quick hack visible on github.

On a server, create a darkpan user with a scponly shell. Using CPAN-Site set up a private CPAN as explained in its excellent documentation, at the root of the account.

Don’t forget to add the SSH public keys of all future clients of your DarkPAN into ~darkpan/.ssh/authorized_keys to avoid any password prompt.

Once your archive modules copied to ~darkpan/authors/id/M/MY/MYID and indexed via cpansite --site ~darkpan index, you can use the patched cpanminus module to access them:

cpanm -i --mirror scp://darkpan@example.com:. PrivateModule

Your private modules won’t be find on the official CPAN index, then cpanm will fall back to your mirror…

Easy, no?

If you want to test it, check out the cpanminus distribution then do a perl Makefile.PL --author, it will create the fat cpanm executable.

Technical note: to do a good work, it would be better to make cpanminus dispatches its mirror and get actions based on the protocol of the passed URI rather than on the availability of LWP, wget, curl or another module. One would avoid to patch each of these backends to detect file:// or scp://.

REFCOUNTers and AnyEvent…

Perl use reference counting to destroy unused objects. Each times an object is referenced, the reference counter of the object is increased by one. At the opposite, when a reference to the object is destroyed, the reference counter of the object is decreased by one. When the reference counter reaches 0, the object is destroyed.

This is a cool feature but sometimes, we have to be very careful especially when we over-abuse of closures as we like to do it in AnyEvent, for example…

Because when an object references another one, that itself references the previous one, the two objects will never be destroyed automatically… With closures it can happen very quickly…

Take that code:

use 5.010;
use strict;
use warnings;

use AnyEvent;

my $cv = AE::cv;

{
    # Instanciate an object ($obj.REFCOUNT=1)
    my $obj = Test->new;

    # Create a new reference on it ($obj.REFCOUNT=2)
    my $obj_dup = $obj;

    # Create a timer: every seconds, print a counter
    # ($obj.COUNTER=3 since it is used in the timer callback/closure)
    $obj->{watcher} = AE::timer(1, 1, sub { say ++$obj_dup->{count} });

    # Keep the main condvar reference so we will be able to stop the
    # event loop when the instance won't be in use by anyone (see Test
    # DESTROY method)...
    $obj->{condvar} = $cv;
}
# $obj.REFCOUNT=1 since $obj and $obj_dup no longer exist, but the
# callback is still active since it remains a reference to the Test
# instance in the timer callback/closure.

# THE event loop...
$cv->recv;


package Test;

use Carp;

sub new
{
    return bless {}, shift;
}

# Called when perl automatically destroy the object instance
sub DESTROY
{
    my $self = shift;

    # Stop the event loop...
    $self->{condvar}->send;

    carp "DESTROYed!";
}

If you launch it, you will see that it will never end… Printing 1, 2, 3… The timer callback never stops. Even when $obj and $obj_dup go out of scope, the Test instance remains alive…

The reason is that, after the creation of the two first references $obj and $obj_dup, the instance used in the timer closure increased the reference counter by one, so it became 3. When the two references $obj and $obj_dup went out of scope, the reference counter decreased by 2, so it became 1… Forever…

We have here a cyclic reference:

Test instance -> timer watcher -> closure -> Test instance -> etc.

So the Test instance can not be destroyed…

The solution: use a weak reference.

A weak reference is like a normal reference, but it does not increase the reference counter. And when a reference counter of an object decreases to 0, perl automatically set to undef all weak references that referenced it…

To create a weak reference, you can use the weaken() function of Scalar::Util module. So the previous code becomes:

use 5.010;
use strict;
use warnings;

use Scalar::Util;

use AnyEvent;

my $cv = AE::cv;

# Yes, now we declare this variable out of the following scope, just
# to see the undef effect...
my $obj_dup;

{
    # Instanciate an object ($obj.REFCOUNT=1)
    my $obj = Test->new;

    # Create a new reference on it ($obj.REFCOUNT=2)
    $obj_dup = $obj;

    # Make $obj_dup a weak reference ($obj.REFCOUNT=1)
    Scalar::Util::weaken($obj_dup);

    # Create a timer: every seconds, print a counter
    # As we use a weak reference in the timer callback/closure, the
    # reference counter does not increase.
    $obj->{watcher} = AE::timer(1, 1, sub { say ++$obj_dup->{count} });

    # Keep the main condvar reference so we will be able to stop the
    # event loop when the instance won't be in use by anyone (see Test
    # DESTROY method)...
    $obj->{condvar} = $cv;
}
# $obj.REFCOUNT=0 since $obj no longer exist => Test::DESTROY is called.

# And $obj_dup becomes undef
unless (defined $obj_dup)
{
    warn '$obj_dup now undefined!';
}

# THE event loop...
$cv->recv;


package Test;

use Carp;

sub new
{
    return bless {}, shift;
}

# Called when perl automatically destroy the object instance
sub DESTROY
{
    my $self = shift;

    # Stop the event loop...
    $self->{condvar}->send;

    carp "DESTROYed!";
}

In this case the event loop stops immediately.

When $obj goes out of scope, the reference counter of the instance decrease from 1 to 0, so perl destroys the instance and so calls Test::DESTROY() which prints “DESTROYed! at – line 16″. Then it undefines $obj_dup so we print “$obj_dup now undefined! at – line 39.”.

The job is done, but be careful!!! :)

the first weeks of mongodb @ijenko

In short, we need to be able to query tens of millions of rows of “real time data”. The system must be also easilly scalable since the previous requirement increases hand in hand with the amount of end users.

Since MongoDB seems to comply with all our needs, we have now spent couple of weeks evaluating and studying it.

Our experience with MongoDB has been so far very positive.

We have been testing MongoDB 1.8.0 in two different environments. The first, a quite powerful testing server and the second, a “lab-cluster” build from tiny Proxmox -instances.

 

On a single server

In our single server setup we had Mongo running on a server having Xeon E5405 (8 cores) with 16G of memory and RAID-1 SSD disk (OS Debian Lenny). The goal of these first tests were for us to learn the basic usage of MongoDB and to check the theorethical limits of Mongo on a server where we had already some experience on running other databases.  We were also checking out some Mongo’s client libraries.

We were amazed how quikly and painlesly we managed to get everything up and running. At the end of the day we had some overal and simple “benchmark results”.

To put it short, our results were (with PHP client):

Insert
insert without index: 54K inserts/sec
insert with 1 index: 37K inserts/sec
insert with 2 index: 28K inserts/sec

FindOne: based on a collection of 36 million documents, the search key is indexed
1 client 10K reads/sec
2 client 20K reads/sec

8 clients 57K reads/sec <– server in pain

Insert and find mixed
insert without index+concurrent find: 37K insert/sec write , 1 read/sec read
insert with 1 index+concurrent find: 22K insert/sec write , 7k read/sec read
insert with 2 index+concurrent find: 14K insert/sec write , 8k read/sec read

Managing 50K inserts/second is far more than we expected. Adding indexes and queries in parallel stills gave some interesting raw number.

 

The power of MongoDB – easy sharding and replica sets

On our second,  “lab-cluster” -test, we concentrated on learning more about the administration of Mongo, stability, backups, recovery and vertical scalability.

We set up an example configuration where we had a sharding cluster made of two replica sets (both sets having 1 master, 1 slave and 1 arbiter). During every test run we added a third shard to the cluster in start + 2h.

Three Configuration servers of Mongo were running on own dedicated nodes. Two Mongos routers were running on application servers.

We hacked a simple PHP application to insert randomly 50 to 2000 documents per second to the database. We kept the meta data of the inserts in a Redis instance. We used the meta data to verify if any inserts were lost when primary node of the replica set got killed, machine got completely jammed or network went down.

During a test run we initialized the database with 4 million documents. After the initialization, the PHP application was running for the next 6 hours. During the execution of the PHP script, in about every 30 minutes a master or a slave of a replica set was “kill -9″ and recovered 15 minutes later. A separate PHP script to query the database was also running, creating an average of 500 -”range” queries per second.

The results in short

After 3 times executing the tests we had 0 missing inserts when running the clients on “safe mode”. When safe mode was not on and an average 1000 inserts where done while primary of the replica set was shot down, an average of 13 inserts were lost. All administrative tasks, such as adding more shards and taking back-ups were easy and did not require any down time. 10 points!

 

A bit more detailed

MongoDB promises: Automatic balancing for changes in load and data distribution, Easy addition of new machines, Scaling out to one thousand nodes, No single points of failure, Automatic failover.

So far it has been fulfilling its promises.

Automatic balancing for changes in load and data distribution

During the tests all the shards seemed to be equally loaded. However we did not take stats of this so we don’t have any hard data about it. Will done that next time. The data was nicely balanced on all the shards.

Easy addition of new machines

In our tests we added “shard-3″ to the set every time after the test had been running for 2 hours. It was easy and did not need or cause any downtime. We did not experience problems with this during any of the test runs. However in our tests it took sometimes 2 hours before the third shard started to get populated. We are not experienced enough to explain why this happened.

Scaling out to one thousand nodes

Cannot comment since we were running this now only on three nodes. But our baby steps are looking promising.

No single points of failure

Setting up sharding with replica sets was an easy task. Also administration of them was more simply than we expected.

We manage to take database dumps and backups of all the services without downtime. Recovery was simple as well. Also a complete wipe out of a config server was not a problem as long as there was a backup in hand. If you don’t have a backups from the config servers, it seems that you are in trouble.

We even tried to just wipe out one of the config servers, set it up again and just copy a Db of another config server and started it with a “–repair” -option. All seemed to work without any problems. Somehow I have a feeling it’s not supposed to be done like that.

Automatic failover

Nothing to complain about this point either. The automatic failover in replica sets worked like a charm every single time.

 

Final words after the “first step”

As you might have noticed, so far we have been very happy of our experience with MongoDB. It seems that it is fulfilling all our requirements. However a word of caution is needed: we are still in honeymoon with Mongo. Everything worked so nicely and without problems that we might have been blinded by some obvious issues.

There would be a lot to write more about the backups, recovery of the data, admin tools and available debug information.  We will write more about those when we have more experience on using them with some “real life” use cases and longer test runs.

Proxy dispatcher for HTTP/SSL *and* SSH

Peteris Krumins just told us how he helps one of his friends to bypass a firewall to do SSH through the port 443 (HTTP/SSL one).

Last year, I did a proof of concept of a proxy that will listen on port 443 and forward the data on the internal HTTP server or SSH server, based on the client behavior without decoding anything.

To achieve this, I used AnyEvent, fantastic event loop manager…

One things to know is that when doing HTTP or HTTP over SSL, it is the client that first talk to the server, doing like:

GET /index.html HTTP/1.1
Host: www.ijenko.com
...

With SSH, the server announces itself, like that:

SSH-2.0-OpenSSH_5.4p1 FreeBSD-20100308

waiting then for client data…

So our proxy just has to wait a little bit after accepting the client connection (here we wait 0.5 seconds) before deciding what to do.

If the client talk during this time, it probably wants to do HTTP, if not it probably wants to do SSH.

The delay only impact SSH connections and only at the first step.

So reconfigure your HTTP server to only listen on localhost, then launch the proxy with the network side address.

Note that you can change the proxy to connect to different hosts than the local one (here 127.1), it’s up to you.

Enjoy… :-)

Just keep in mind that all connections to your internal HTTP and SSH servers will be coming from the proxy, you will not be able to know the real source, only the proxy knows…

use strict;
use warnings;

use AnyEvent;
use AnyEvent::Socket;
use AnyEvent::Handle;

die "usage: $0 BIND_IP_ADDRESS\n" if @ARGV != 1;

my $ip_address = shift;

use constant DEBUG => 1;

use constant {
    BIND_PORT   => 443,

    SSL_PORT    => 443,
    SSH_PORT    => 22,
};

tcp_server($ip_address, BIND_PORT, sub
           {
               my($fh, $host, $port) = @_;

               my $cnx = Cnx->new;

               $cnx->client_handle(
                   AnyEvent::Handle->new(
                       fh          => $fh,
                       rtimeout    => 0.5,
                       on_error    => $cnx->on_error,
                       # Client didn't say anything after initial timeout => SSH
                       on_rtimeout => $cnx->on_init_action(SSH_PORT),
                       # Client talk immediately => SSL
                       on_read     => $cnx->on_init_action(SSL_PORT)));

               warn "$host:$port connected.\n" if DEBUG;
           });


package Cnx;

use Scalar::Util qw(refaddr);

use AnyEvent;
use AnyEvent::Socket;
use AnyEvent::Handle;

use Carp;

my %CONNECTIONS;

sub new
{
    my($class, %opt) = @_;

    my $self = bless \%opt, $class;

    $CONNECTIONS{refaddr $self} = $self;

    return $self;
}


sub DESTROY
{
    my $self = shift;

    delete $CONNECTIONS{refaddr $self};

    warn "$self DESTROYed\n" if main::DEBUG;
}

# Create two accessors/mutators for attributes...
foreach my $attribute (qw(client_handle serv_handle))
{
    no strict 'refs';

    *$attribute = sub
    {
        if (@_ == 1)
        {
            return $_[0]{$attribute};
        }

        if (@_ == 2)
        {
            return $_[0]{$attribute} = $_[1];
        }

        carp "$attribute miscalled...";
    };
}


sub close_all
{
    my $self = shift;

    if (defined(my $handle = $self->client_handle))
    {
        $handle->destroy;
        $self->client_handle(undef);
    }

    if (defined(my $handle = $self->serv_handle))
    {
        $handle->destroy;
        $self->serv_handle(undef);
    }

    delete $CONNECTIONS{refaddr $self};
}


sub on_error
{
    my $self = shift;

    return sub
    {
        $self // return;

        my($handle, undef, $message) = @_;

        warn "CLIENT got error $message\n" if main::DEBUG;

        $self->close_all;
    };
}


sub on_init_action
{
    my($self, $port) = @_;

    # Something happens during the probe period
    return sub
    {
        my($handle, undef, $message) = @_;

        warn "$self on_init_action(PORT=$port).\n" if main::DEBUG;

        unless (defined $self->serv_handle)
        {
            # We cancel the timeout and we connect to the internal service
            $self->client_handle->rtimeout(0);

            tcp_connect('127.1', $port, $self->on_serv_connected($port));
        }
    };
}


sub on_client_read
{
    my $self = shift;

    # Client talk after the connection to the internal service
    return sub
    {
        my $handle = shift;

        warn "CLIENT -> serv: " . length($handle->{rbuf}) . " bytes\n"
            if main::DEBUG;

        $self->serv_handle->push_write(delete $handle->{rbuf});
    };
}


sub on_serv_connected
{
    my($self, $port) = @_;

    # We just connected to the internal service (or failed to)
    return sub
    {
        my $fh = shift;

        unless (defined $fh)
        {
            warn "Can't connect to internal service on port $port: $!\n";
            $self->close_all;
            return;
        }

        my $serv_handle = AnyEvent::Handle->new(
            fh => $fh,
            on_error => $self->on_serv_error,
            on_read  => $self->on_serv_read);

        warn "$serv_handle serv_connected\n" if main::DEBUG;

        $self->serv_handle($serv_handle);

        $self->client_handle->on_read($self->on_client_read);
    };
}


sub on_serv_error
{
    my $self = shift;

    # Error from internal service side
    return sub
    {
        my($serv_handle, undef, $msg) = @_;

        warn "SERV got error $msg\n" if main::DEBUG;

        $self->close_all;
    };
}


sub on_serv_read
{
    my $self = shift;

    # Something to read from internal service
    return sub
    {
        my $handle = shift;

        warn "SERV -> client: " . length($handle->{rbuf}) . " bytes\n"
            if main::DEBUG;

        $self->client_handle->push_write(delete $handle->{rbuf});
    };
}


package main;

AnyEvent->condvar->recv;

MongoDB: PHP 1 / Perl 0

We just made some tests with MongoDB. As we use PHP and Perl as main languages, we decided to benchmark both on a very simple case: inserting 10.000.000 entries.

The PHP script we use, mongo_pain.php:

< ?php

$m = new Mongo();
$db = $m->db_bench;
$collection = $db->tphp;

$num = 10000000;
while ($num-- > 0)
{
    $collection->insert(array('idfox'           => $num,
                      'idxxxxxx'        => 'outchagaloup',
                      'idxxxxxxref'     => 0,
                      'value'           => 'no_rice_without_price',
                      'datetime'        => 'newDate()',
                      'date'            => '2011-03-20',
                      'time'            => '17:00:00'));
}

exit;

?>

And the Perl version, mongo_pain.pl:

#!/usr/bin/perl

use 5.010;

use strict;
use warnings;

use MongoDB;

$MongoDB::BSON::utf8_flag_on = 0;

my $MONGO_HOST = 'localhost';
my $MONGO_PORT = 27017;

my $mongo = MongoDB::Connection->new(host => $MONGO_HOST, port => $MONGO_PORT);

my $mongo_database   = $mongo->db_bench;
my $mongo_collection = $mongo_database->tprl;

my $num = 10_000_000;
while ($num-- > 0)
{
    $mongo_collection->insert(
        {
            idfox       => $num,
            idxxxxxx    => 'outchagaloup',
            idxxxxxxref => 0,
            value       => 'no_rice_without_price',
            datetime    => 'newDate()',
            date        => '2011-03-20',
            time        => '17:00:00',
        });
}

We were very surprised by the results…

> time php mongo_pain.php
php mongo_pain.php  101,45s user 9,33s system 43% cpu 4:16,44 total

and

> time ./mongo_pain.pl
./mongo_pain.pl  335,99s user 100,80s system 95% cpu 7:37,88 total

Big difference!!!

During running, a top for each launch gives for PHP:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30055 root      40   0 31.3g 6.2g 6.2g S  100 39.4  11:38.90 mongod
14787 root      40   0  116m 9320 5884 R   57  0.1   0:43.74 php

and for Perl:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17405 root      40   0 47332  10m 2456 R  100  0.1   0:42.64 mongo_pain.pl
30055 root      40   0 33.3g 6.1g 6.1g S   76 39.0  14:16.58 mongod

We use PHP 5.3.5 and Perl 5.10.1 under a debian lenny on a xeon E5405 (8 core) with 16G of memory and SSD disk.

To try to understand, we profiled the perl script with the excellent Devel::NYTProf reducing the number of inserts from 10.000.000 to 1.000.000.

perl -d:NYTProf mongo_pain.pl && nytprofhtml

You can browse the results here.

Most of the time is spend in MongoDB::Collection::batch_insert, which calls MongoDB::write_insert and MongoDB::Connection::send. These functions are XS ones, too bad…

We have to investigate in C code…

Before doing that, we just stace‘d each execution. The fact that the perl client was at 100% of CPU while mongod stayed around 75 can tell that it don’t wait correctly for IOs.

strace -p $PID -ttt on PHP script execution:

1301070012.395152 sendto(3, "\320/!\n\322\7db_bench.tph"..., 208, 0, NULL, 0) = 208
1301070012.395205 time(NULL)            = 1301070012
1301070012.395239 time(NULL)            = 1301070012
1301070012.395270 sendto(3, "\320000!\n\322\7db_bench.tph"..., 208, 0, NULL, 0) = 208
1301070012.395324 time(NULL)            = 1301070012
1301070012.395357 time(NULL)            = 1301070012
1301070012.395388 sendto(3, "\320001!\n\322\7db_bench.tph"..., 208, 0, NULL, 0) = 208
1301070012.395442 time(NULL)            = 1301070012
1301070012.395475 time(NULL)            = 1301070012
...

and on perl one:

1301069922.956067 sendto(3, "\330\351\t\23\322\7db_bench.tpr"..., 216, 0, NULL, 0) = 216
1301069922.956184 time(NULL)            = 1301069922
1301069922.956230 sendto(3, "\330\352\t\23\322\7db_bench.tpr"..., 216, 0, NULL, 0) = 216
1301069922.956343 time(NULL)            = 1301069922
1301069922.956389 sendto(3, "\330\353\t\23\322\7db_bench.tpr"..., 216, 0, NULL, 0) = 216
1301069922.956502 time(NULL)            = 1301069922
...

Except the fact that PHP does an extra time() syscall for each loop and that the Perl sendto() syscall send 8 bytes more than PHP one, nothing differs… Looking into perl and PHP modules sources gives use the reason of the 8 extra bytes: our perl is 64 bits compiled, storing all integers as BSON_LONG (64 bits integers), while PHP is probably compiled with SIZEOF_LONG=4, storing all integers as BSON_INT (32 bits integers). As we store 2 integers (idfox and idxxxxxxref values), it adds 2 x 4 bytes for the perl side. Perhaps in the next version of MongoDB perl module (0.43) we will be able to choose how to store integers, see http://jira.mongodb.org/browse/PERL-127.

So next step, the hammer to kill the fly… Let’s try valgrind…

valgrind --tool=callgrind /usr/bin/perl mongo_pain.pl

See you next post…