Mojolicous, render_later and weaken transactions

When using render_later method, did you ever encountered this Mojolicious error?

Can't call method "res" on an undefined value at /home/max/perl5/lib/perl5/Mojolicious/Controller.pm line 275.

Read the rest of this entry »

DarkPAN over *anything_you_want* using cpanminus

Yesterday, I talked about a quick hack to add the SSH protocol using the scp URI scheme feature to cpanminus. At the end of the article, I added a technical note to explain how I saw the things to handle more protocols.

I just tried to do it.

With this version, cpanm can now handle scp:// and sftp:// URI schemes.

For scp://, it first tries to use the scp command then curl (hoping that curl is linked with libssh2).

For sftp://, it first tries to use the sftp command then curl (still hoping that curl is linked with libssh2).

It seems to work well.

I added the ability to develop external URI scheme backends. Imagine you want to add a new URI scheme, say foo://, without modifying App::cpanminus package.

To achieve this goal, create a module in the namespace App::cpanminus::backend::foo that will return a hash ref with two keys: get and mirror.

For each key, you have to provide an anonymous function to handle the action, like this:

package App::cpanminus::backend::foo;

{
    get => sub
    {
        my($self, $uri) = @_;
        # retrieve the content of the resource pointed by $uri (foo://...)
        # then return it
        return $content;
    },
    mirror => sub
    {
        my($self, $uri, $path) = @_;
        # retrieve the content of the resource pointed by $uri (foo://...)
        # and copy it to the file $path
        return $wathever_you_want;
    },
};
__END__

That’s all…

UPDATE 2011/10/2: As PaweŇā Murias suggested in comments below, I changed the “API” to a more standard one:

package App::cpanminus::backend::foo;

sub get
{
    my($self, $uri) = @_;
    # retrieve the content of the resource pointed by $uri (foo://...)
    # then return it
    return $content;
}

sub mirror
{
    my($self, $uri, $path) = @_;
    # retrieve the content of the resource pointed by $uri (foo://...)
    # and copy it to the file $path
    return $wathever_you_want;
}

1;
__END__

The hash ref old one in not available anymore.

DarkPAN over SSH using cpanminus

You want to set up a private CPAN, aka a DarkPAN, to put your own private modules in and deploy them as if they came from the official CPAN using cpanminus and its fantastic command cpanm.

The problem is that cpanminus works only with files, HTTP/HTTPS and FTP URIs. It is fine for all public uses, but for private ones, it is boring doing a setup of an HTTPS server with some authentication scheme, while SSH is so simple to use…

cpanm can use curl as backend and curl can handle scp/sftp if linked with libssh2. It’s OK, but too often in binary distros, curl is not linked with libssh2… and cpanm filters URIs based on the /^(ftp|https?|file):/ regexp… ūüôĀ Too bad…

So as a quick hack and to avoid to depend on a specially compiled curl version, we modified cpanminus to make it support the scp protocol using the scp command. Quick hack visible on github.

On a server, create a darkpan user with a scponly shell. Using CPAN-Site set up a private CPAN as explained in its excellent documentation, at the root of the account.

Don’t forget to add the SSH public keys of all future clients of your DarkPAN into ~darkpan/.ssh/authorized_keys to avoid any password prompt.

Once your archive modules copied to ~darkpan/authors/id/M/MY/MYID and indexed via cpansite --site ~darkpan index, you can use the patched cpanminus module to access them:

cpanm -i --mirror scp://darkpan@example.com:. PrivateModule

Your private modules won’t be find on the official CPAN index, then cpanm will fall back to your mirror…

Easy, no?

If you want to test it, check out the cpanminus distribution then do a perl Makefile.PL --author, it will create the fat cpanm executable.

Technical note: to do a good work, it would be better to make cpanminus dispatches its mirror and get actions based on the protocol of the passed URI rather than on the availability of LWP, wget, curl or another module. One would avoid to patch each of these backends to detect file:// or scp://.

REFCOUNTers and AnyEvent…

Perl use reference counting to destroy unused objects. Each times an object is referenced, the reference counter of the object is increased by one. At the opposite, when a reference to the object is destroyed, the reference counter of the object is decreased by one. When the reference counter reaches 0, the object is destroyed.

This is a cool feature but sometimes, we have to be very careful especially when we over-abuse of closures as we like to do it in AnyEvent, for example…

Because when an object references another one, that itself references the previous one, the two objects will never be destroyed automatically… With closures it can happen very quickly…

Take that code:

use 5.010;
use strict;
use warnings;

use AnyEvent;

my $cv = AE::cv;

{
    # Instanciate an object ($obj.REFCOUNT=1)
    my $obj = Test->new;

    # Create a new reference on it ($obj.REFCOUNT=2)
    my $obj_dup = $obj;

    # Create a timer: every seconds, print a counter
    # ($obj.COUNTER=3 since it is used in the timer callback/closure)
    $obj->{watcher} = AE::timer(1, 1, sub { say ++$obj_dup->{count} });

    # Keep the main condvar reference so we will be able to stop the
    # event loop when the instance won't be in use by anyone (see Test
    # DESTROY method)...
    $obj->{condvar} = $cv;
}
# $obj.REFCOUNT=1 since $obj and $obj_dup no longer exist, but the
# callback is still active since it remains a reference to the Test
# instance in the timer callback/closure.

# THE event loop...
$cv->recv;


package Test;

use Carp;

sub new
{
    return bless {}, shift;
}

# Called when perl automatically destroy the object instance
sub DESTROY
{
    my $self = shift;

    # Stop the event loop...
    $self->{condvar}->send;

    carp "DESTROYed!";
}

If you launch it, you will see that it will never end… Printing 1, 2, 3… The timer callback never stops. Even when $obj and $obj_dup go out of scope, the Test instance remains alive…

The reason is that, after the creation of the two first references $obj and $obj_dup, the instance used in the timer closure increased the reference counter by one, so it became 3. When the two references $obj and $obj_dup went out of scope, the reference counter decreased by 2, so it became 1… Forever…

We have here a cyclic reference:

Test instance -> timer watcher -> closure -> Test instance -> etc.

So the Test instance can not be destroyed…

The solution: use a weak reference.

A weak reference is like a normal reference, but it does not increase the reference counter. And when a reference counter of an object decreases to 0, perl automatically set to undef all weak references that referenced it…

To create a weak reference, you can use the weaken() function of Scalar::Util module. So the previous code becomes:

use 5.010;
use strict;
use warnings;

use Scalar::Util;

use AnyEvent;

my $cv = AE::cv;

# Yes, now we declare this variable out of the following scope, just
# to see the undef effect...
my $obj_dup;

{
    # Instanciate an object ($obj.REFCOUNT=1)
    my $obj = Test->new;

    # Create a new reference on it ($obj.REFCOUNT=2)
    $obj_dup = $obj;

    # Make $obj_dup a weak reference ($obj.REFCOUNT=1)
    Scalar::Util::weaken($obj_dup);

    # Create a timer: every seconds, print a counter
    # As we use a weak reference in the timer callback/closure, the
    # reference counter does not increase.
    $obj->{watcher} = AE::timer(1, 1, sub { say ++$obj_dup->{count} });

    # Keep the main condvar reference so we will be able to stop the
    # event loop when the instance won't be in use by anyone (see Test
    # DESTROY method)...
    $obj->{condvar} = $cv;
}
# $obj.REFCOUNT=0 since $obj no longer exist => Test::DESTROY is called.

# And $obj_dup becomes undef
unless (defined $obj_dup)
{
    warn '$obj_dup now undefined!';
}

# THE event loop...
$cv->recv;


package Test;

use Carp;

sub new
{
    return bless {}, shift;
}

# Called when perl automatically destroy the object instance
sub DESTROY
{
    my $self = shift;

    # Stop the event loop...
    $self->{condvar}->send;

    carp "DESTROYed!";
}

In this case the event loop stops immediately.

When $obj goes out of scope, the reference counter of the instance decrease from 1 to 0, so perl destroys the instance and so calls Test::DESTROY() which prints “DESTROYed! at – line 16”. Then it undefines $obj_dup so we print “$obj_dup now undefined! at – line 39.”.

The job is done, but be careful!!! ūüôā

Proxy dispatcher for HTTP/SSL *and* SSH

Peteris Krumins just told us how he helps one of his friends to bypass a firewall to do SSH through the port 443 (HTTP/SSL one).

Last year, I did a proof of concept of a proxy that will listen on port 443 and forward the data on the internal HTTP server or SSH server, based on the client behavior without decoding anything.

To achieve this, I used AnyEvent, fantastic event loop manager…

One things to know is that when doing HTTP or HTTP over SSL, it is the client that first talk to the server, doing like:

GET /index.html HTTP/1.1
Host: www.ijenko.com
...

With SSH, the server announces itself, like that:

SSH-2.0-OpenSSH_5.4p1 FreeBSD-20100308

waiting then for client data…

So our proxy just has to wait a little bit after accepting the client connection (here we wait 0.5 seconds) before deciding what to do.

If the client talk during this time, it probably wants to do HTTP, if not it probably wants to do SSH.

The delay only impact SSH connections and only at the first step.

So reconfigure your HTTP server to only listen on localhost, then launch the proxy with the network side address.

Note that you can change the proxy to connect to different hosts than the local one (here 127.1), it’s up to you.

Enjoy… ūüôā

Just keep in mind that all connections to your internal HTTP and SSH servers will be coming from the proxy, you will not be able to know the real source, only the proxy knows…

use strict;
use warnings;

use AnyEvent;
use AnyEvent::Socket;
use AnyEvent::Handle;

die "usage: $0 BIND_IP_ADDRESS\n" if @ARGV != 1;

my $ip_address = shift;

use constant DEBUG => 1;

use constant {
    BIND_PORT   => 443,

    SSL_PORT    => 443,
    SSH_PORT    => 22,
};

tcp_server($ip_address, BIND_PORT, sub
           {
               my($fh, $host, $port) = @_;

               my $cnx = Cnx->new;

               $cnx->client_handle(
                   AnyEvent::Handle->new(
                       fh          => $fh,
                       rtimeout    => 0.5,
                       on_error    => $cnx->on_error,
                       # Client didn't say anything after initial timeout => SSH
                       on_rtimeout => $cnx->on_init_action(SSH_PORT),
                       # Client talk immediately => SSL
                       on_read     => $cnx->on_init_action(SSL_PORT)));

               warn "$host:$port connected.\n" if DEBUG;
           });


package Cnx;

use Scalar::Util qw(refaddr);

use AnyEvent;
use AnyEvent::Socket;
use AnyEvent::Handle;

use Carp;

my %CONNECTIONS;

sub new
{
    my($class, %opt) = @_;

    my $self = bless \%opt, $class;

    $CONNECTIONS{refaddr $self} = $self;

    return $self;
}


sub DESTROY
{
    my $self = shift;

    delete $CONNECTIONS{refaddr $self};

    warn "$self DESTROYed\n" if main::DEBUG;
}

# Create two accessors/mutators for attributes...
foreach my $attribute (qw(client_handle serv_handle))
{
    no strict 'refs';

    *$attribute = sub
    {
        if (@_ == 1)
        {
            return $_[0]{$attribute};
        }

        if (@_ == 2)
        {
            return $_[0]{$attribute} = $_[1];
        }

        carp "$attribute miscalled...";
    };
}


sub close_all
{
    my $self = shift;

    if (defined(my $handle = $self->client_handle))
    {
        $handle->destroy;
        $self->client_handle(undef);
    }

    if (defined(my $handle = $self->serv_handle))
    {
        $handle->destroy;
        $self->serv_handle(undef);
    }

    delete $CONNECTIONS{refaddr $self};
}


sub on_error
{
    my $self = shift;

    return sub
    {
        $self // return;

        my($handle, undef, $message) = @_;

        warn "CLIENT got error $message\n" if main::DEBUG;

        $self->close_all;
    };
}


sub on_init_action
{
    my($self, $port) = @_;

    # Something happens during the probe period
    return sub
    {
        my($handle, undef, $message) = @_;

        warn "$self on_init_action(PORT=$port).\n" if main::DEBUG;

        unless (defined $self->serv_handle)
        {
            # We cancel the timeout and we connect to the internal service
            $self->client_handle->rtimeout(0);

            tcp_connect('127.1', $port, $self->on_serv_connected($port));
        }
    };
}


sub on_client_read
{
    my $self = shift;

    # Client talk after the connection to the internal service
    return sub
    {
        my $handle = shift;

        warn "CLIENT -> serv: " . length($handle->{rbuf}) . " bytes\n"
            if main::DEBUG;

        $self->serv_handle->push_write(delete $handle->{rbuf});
    };
}


sub on_serv_connected
{
    my($self, $port) = @_;

    # We just connected to the internal service (or failed to)
    return sub
    {
        my $fh = shift;

        unless (defined $fh)
        {
            warn "Can't connect to internal service on port $port: $!\n";
            $self->close_all;
            return;
        }

        my $serv_handle = AnyEvent::Handle->new(
            fh => $fh,
            on_error => $self->on_serv_error,
            on_read  => $self->on_serv_read);

        warn "$serv_handle serv_connected\n" if main::DEBUG;

        $self->serv_handle($serv_handle);

        $self->client_handle->on_read($self->on_client_read);
    };
}


sub on_serv_error
{
    my $self = shift;

    # Error from internal service side
    return sub
    {
        my($serv_handle, undef, $msg) = @_;

        warn "SERV got error $msg\n" if main::DEBUG;

        $self->close_all;
    };
}


sub on_serv_read
{
    my $self = shift;

    # Something to read from internal service
    return sub
    {
        my $handle = shift;

        warn "SERV -> client: " . length($handle->{rbuf}) . " bytes\n"
            if main::DEBUG;

        $self->client_handle->push_write(delete $handle->{rbuf});
    };
}


package main;

AnyEvent->condvar->recv;

MongoDB: PHP 1 / Perl 0

We just made some tests with MongoDB. As we use PHP and Perl as main languages, we decided to benchmark both on a very simple case: inserting 10.000.000 entries.

The PHP script we use, mongo_pain.php:

< ?php

$m = new Mongo();
$db = $m->db_bench;
$collection = $db->tphp;

$num = 10000000;
while ($num-- > 0)
{
    $collection->insert(array('idfox'           => $num,
                      'idxxxxxx'        => 'outchagaloup',
                      'idxxxxxxref'     => 0,
                      'value'           => 'no_rice_without_price',
                      'datetime'        => 'newDate()',
                      'date'            => '2011-03-20',
                      'time'            => '17:00:00'));
}

exit;

?>

And the Perl version, mongo_pain.pl:

#!/usr/bin/perl

use 5.010;

use strict;
use warnings;

use MongoDB;

$MongoDB::BSON::utf8_flag_on = 0;

my $MONGO_HOST = 'localhost';
my $MONGO_PORT = 27017;

my $mongo = MongoDB::Connection->new(host => $MONGO_HOST, port => $MONGO_PORT);

my $mongo_database   = $mongo->db_bench;
my $mongo_collection = $mongo_database->tprl;

my $num = 10_000_000;
while ($num-- > 0)
{
    $mongo_collection->insert(
        {
            idfox       => $num,
            idxxxxxx    => 'outchagaloup',
            idxxxxxxref => 0,
            value       => 'no_rice_without_price',
            datetime    => 'newDate()',
            date        => '2011-03-20',
            time        => '17:00:00',
        });
}

We were very surprised by the results…

> time php mongo_pain.php
php mongo_pain.php  101,45s user 9,33s system 43% cpu 4:16,44 total

and

> time ./mongo_pain.pl
./mongo_pain.pl  335,99s user 100,80s system 95% cpu 7:37,88 total

Big difference!!!

During running, a top for each launch gives for PHP:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30055 root      40   0 31.3g 6.2g 6.2g S  100 39.4  11:38.90 mongod
14787 root      40   0  116m 9320 5884 R   57  0.1   0:43.74 php

and for Perl:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17405 root      40   0 47332  10m 2456 R  100  0.1   0:42.64 mongo_pain.pl
30055 root      40   0 33.3g 6.1g 6.1g S   76 39.0  14:16.58 mongod

We use PHP 5.3.5 and Perl 5.10.1 under a debian lenny on a xeon E5405 (8 core) with 16G of memory and SSD disk.

To try to understand, we profiled the perl script with the excellent Devel::NYTProf reducing the number of inserts from 10.000.000 to 1.000.000.

perl -d:NYTProf mongo_pain.pl && nytprofhtml

You can browse the results here.

Most of the time is spend in MongoDB::Collection::batch_insert, which calls MongoDB::write_insert and MongoDB::Connection::send. These functions are XS ones, too bad…

We have to investigate in C code…

Before doing that, we just stace‘d each execution. The fact that the perl client was at 100% of CPU while mongod stayed around 75 can tell that it don’t wait correctly for IOs.

strace -p $PID -ttt on PHP script execution:

1301070012.395152 sendto(3, "\320/!\n\322\7db_bench.tph"..., 208, 0, NULL, 0) = 208
1301070012.395205 time(NULL)            = 1301070012
1301070012.395239 time(NULL)            = 1301070012
1301070012.395270 sendto(3, "\320000!\n\322\7db_bench.tph"..., 208, 0, NULL, 0) = 208
1301070012.395324 time(NULL)            = 1301070012
1301070012.395357 time(NULL)            = 1301070012
1301070012.395388 sendto(3, "\320001!\n\322\7db_bench.tph"..., 208, 0, NULL, 0) = 208
1301070012.395442 time(NULL)            = 1301070012
1301070012.395475 time(NULL)            = 1301070012
...

and on perl one:

1301069922.956067 sendto(3, "\330\351\t\23\322\7db_bench.tpr"..., 216, 0, NULL, 0) = 216
1301069922.956184 time(NULL)            = 1301069922
1301069922.956230 sendto(3, "\330\352\t\23\322\7db_bench.tpr"..., 216, 0, NULL, 0) = 216
1301069922.956343 time(NULL)            = 1301069922
1301069922.956389 sendto(3, "\330\353\t\23\322\7db_bench.tpr"..., 216, 0, NULL, 0) = 216
1301069922.956502 time(NULL)            = 1301069922
...

Except the fact that PHP does an extra time() syscall for each loop and that the Perl sendto() syscall send 8 bytes more than PHP one, nothing differs… Looking into perl and PHP modules sources gives use the reason of the 8 extra bytes: our perl is 64 bits compiled, storing all integers as BSON_LONG (64 bits integers), while PHP is probably compiled with SIZEOF_LONG=4, storing all integers as BSON_INT (32 bits integers). As we store 2 integers (idfox and idxxxxxxref values), it adds 2 x 4 bytes for the perl side. Perhaps in the next version of MongoDB perl module (0.43) we will be able to choose how to store integers, see http://jira.mongodb.org/browse/PERL-127.

So next step, the hammer to kill the fly… Let’s try valgrind…

valgrind --tool=callgrind /usr/bin/perl mongo_pain.pl

See you next post…

AnyEvent et POE

AnyEvent et POE sont deux modules Perl permettant de g√©rer de mani√®re transparente les probl√©matiques de boucles d’√©v√©nements et de s’affranchir du syst√®me d’exploitation utilis√©.

Il fournissent une interface uniforme quelle que soit la boucle d’√©v√©nements sous-jacente. Les boucles d’√©v√©nements pouvant √™tre utilis√©es sous POE sont dans le namespace POE::Loop:: tandis que celles pouvant l’√™tre sous AnyEvent sont dans le namespace AnyEvent::Impl::. On peut d’ailleurs voir que AnyEvent peut utiliser POE via AnyEvent::Impl::POE, mais c’est une autre histoire…

Si ces deux modules reconnaissent les événements de type entrées/sorties, les timers et les signaux, ils ont une approche différente de leur gestion :

  • AnyEvent permet d’associer un callback (uniquement une r√©f√©rence sur une fonction) √† chaque √©v√©nement. Point ;
  • POE, quant √† lui, ajoute une couche dans laquelle les √©v√©nements vont √™tre nomm√©s avant d’√™tre distribu√©s. Un callback (r√©f√©rence sur une fonction ou un couple instance / m√©thode) est associ√© √† un √©tat (un nom), puis chaque √©v√©nement va √™tre associ√© √† un √©tat. Lorsqu’un √©v√©nement arrive, il d√©clenche un √©tat et le callback associ√© √† l’√©tat est appel√©. Une sorte d’indirection, en fait. Ces √©tats sont regroup√©s par session. Chaque session est ind√©pendante, mais peut communiquer avec une autre via des √©v√©nements que nous qualifierons de ¬ę logiques ¬Ľ, puisqu’ils n’ont pas le syst√®me pour origine (voir les m√©thodes post() et call() de POE::Kernel). Ce cloisonnement permet d’isoler des composants du programme les uns des autres et ainsi d’√©viter que les noms d’√©tat ne se trouvent tous m√©lang√©s, avec le risque de collision de nom que cela pourrait entra√ģner.

Bien entendu AnyEvent comme POE permettent de r√©aliser n’importe quelle application client/serveur. Pour tous les protocoles orient√©s ligne, cela ne pose aucun probl√®me.

L√† o√Ļ le b√Ęt blesse, c’est lorsque le dialogue utilise des donn√©es s√©rialis√©es. Le probl√®me n’est pas tant comment les donn√©es sont s√©rialis√©es, mais plut√īt comment ces donn√©es s√©rialis√©es sont envoy√©es. Faut-il transmettre la longueur des donn√©es s√©rialis√©es en t√™te ? Si oui sous quelle forme ? Si non y-a-t’il une s√©quence de fin ? Autant d’options qui rendent l’interop√©rabilit√© parfois impossible…

AnyEvent va avoir une attitude diff√©rente selon la m√©thode de s√©rialisation utilis√©e, partant du principe que certaines m√©thodes de d√©s√©rialisation savent d√©tecter toutes seules la fin du flux (JSON ou Data::MessagePack, par exemple). Pour les m√©thodes n’incluant pas cette possibilit√© (comme Storable), la longueur va √™tre transmise en t√™te avec un pack("w", LONGUEUR) par exemple.

POE g√®re les choses de mani√®re g√©n√©rique. √Ä l’aide du module POE::Filter::Reference, quelle que soit la m√©thode de s√©rialisation, la longueur des donn√©es s√©rialis√©es est pass√©e en t√™te sous une forme ¬ę humaine ¬Ľ suivie d’un \0 comme "128\0..." pour une longueur de 128 octets par exemple.

Donc, ce n’est pas compatible ūüôĀ

C’est l√† qu’intervient le module AnyEvent::POE_Reference. Ce module permet de s√©rialiser et de d√©s√©rialiser des donn√©es √† la mode POE depuis AnyEvent sans n√©cessiter la pr√©sence du module POE.

Merci Perl, merci Ijenko !