logos

With Andrii Grytsenko


Technical Diary - With Andrii Grytsenko

Perl performance optimization

I’d like to describe my own experience with perl optimization. There several common way to make you code works more faster.

I used perl module Benchmark to make measurements. Rules:

1. If you can use single quotes instead of double quote, do it. In most cases it will be executed more faster because of string in single quotes is not going to concatenate.

2. Use compiled regexps in case you have loops. It can brings huge performance:

use Benchmark;

sub qwe {
    my $re = shift;
    my $line = "wer";
    foreach my $reg (@$re){
        if ( $line =~ /$reg/) {
            my $l=0;
        }
    }
}

my $line = "sdfsdfsdfsdf";
my $ref = ".*";
my @arr = qw(qwe sdf wer); #plaine regexps
my @c_re;
foreach my $p_re (@arr) { # compile it and put in to the @c_re
    my $re = qr/\b$p_re\b/i ;
    push(@c_re,$re);
}

Benchmark::cmpthese(100000, {
    'Compiled' => sub { qwe(\@c_re) },
    'Not_compiled' => sub { qwe(\@arr) },
});

As result:

                 Rate Not_compiled     Compiled
Not_compiled  36232/s           --         -79%
Compiled     169492/s         368%           --

3. Extract values from string with substr if you can. There are three well knows ways to do such kind of operation: regular expressions, split and substr. Let’s compare it:

sub mksplit {
    my ($line) = @_;
    my ($fst,$snd) = split(/-/,$line);
    return($fst,$snd);
}

sub mkregexp {
    my ($line,$re) = @_;
    $line =~ m/$re/;
    return($1,$2);
}

sub mkstr {
    my ($line) = @_;
    my $pos = index($line,'-');
    my $fst = substr($line,0,$pos);
    my $snd = substr($line,$pos+1);
    return($fst,$snd);
}

my $line = "12-34";
my $n_comp_re = '(\d+)-(\d+)';
my $comp_re = qr/\b$n_comp_re\b/i ;

Benchmark::cmpthese(1000000, {
    'Compiled' => sub { mkregexp($line,$comp_re) },
    'Not_compiled' => sub { mkregexp($line,$n_comp_re) },
    'STR' => sub { mkstr($line) },
    'Split' => sub { mksplit($line) },
});

Benchmark::timethese(1000000, {
    'Compiled' => sub { mkregexp($line,$comp_re) },
    'Not_compiled' => sub { mkregexp($line,$n_comp_re) },
    'STR' => sub { mkstr($line) },
    'Split' => sub { mksplit($line) },
});

Where compiled and not_compiles we use when talk about regular expression.

                 Rate        Split     Compiled Not_compiled          STR
Split        297619/s           --          -9%         -12%         -25%
Compiled     327869/s          10%           --          -4%         -17%
Not_compiled 340136/s          14%           4%           --         -14%
STR          396825/s          33%          21%          17%           --
Benchmark: timing 1000000 iterations of Compiled, Not_compiled, STR, Split...
  Compiled:  4 wallclock secs ( 3.14 usr +  0.00 sys =  3.14 CPU) @ 318471.34/s (n=1000000)
Not_compiled:  3 wallclock secs ( 2.91 usr +  0.00 sys =  2.91 CPU) @ 343642.61/s (n=1000000)
       STR:  2 wallclock secs ( 2.53 usr +  0.00 sys =  2.53 CPU) @ 395256.92/s (n=1000000)
     Split:  3 wallclock secs ( 3.19 usr +  0.00 sys =  3.19 CPU) @ 313479.62/s (n=1000000)

As you can see from output the more faster way to exract values from string is substr, next is split and regular expressions is the most slower. I don’t know why compiled regexps is more slow than regular one in this case. I guess its because of 1 iteration.

4. Don’t use sort in hash loops if there is no necessity. The most faster way to through over the hash is keys statement:

sub mkeach {
    my $hash = shift;
    my %hash2;
    while ( my($key,$value) = each(%$hash)){
        $hash2{$key} = $value;
    }
    return(\%hash2);
}

sub mkkeys {
    my $hash = shift;
    my %hash2;
    for my $key (keys(%$hash)){
        $hash2{$key} = $hash->{$key};
    }
    return(\%hash2);
}

sub mksort {
    my $hash = shift;
    my %hash2;
    foreach my $key (sort keys %$hash) {
        $hash2{$key} = $hash->{$key};
    }
    return(\%hash2);
}

my %hash = (name => 'name1', surname => 'surname1', address => 'address', var => 'var1', var2 => 'var2');

Benchmark::cmpthese(1000000, {
    'Keys' => sub { mkkeys(\%hash) },
    'Each' => sub { mkeach(\%hash) },
    'Sort' => sub { mksort(\%hash) },
});

Benchmark::timethese(1000000, {
    'Keys' => sub { mkkeys(\%hash) },
    'Each' => sub { mkeach(\%hash) },
    'Sort' => sub { mksort(\%hash) },
});
        Rate Each Sort Keys
Each 66800/s   --  -3% -14%
Sort 68681/s   3%   -- -11%
Keys 77459/s  16%  13%   --
Benchmark: timing 1000000 iterations of Each, Keys, Sort...
      Each: 16 wallclock secs (14.92 usr +  0.00 sys = 14.92 CPU) @ 67024.13/s (n=1000000)
      Keys: 12 wallclock secs (12.91 usr +  0.00 sys = 12.91 CPU) @ 77459.33/s (n=1000000)
      Sort: 15 wallclock secs (14.62 usr +  0.00 sys = 14.62 CPU) @ 68399.45/s (n=1000000)

That’s all for now, but I’ll continue to optimize my code so I think there will be an updates soon.

Leave a Reply

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Categories

Translate