a day in the pit my view from inside

27Jul/115

Two variations on Ruby array#flatten

Implement Ruby's flatten method for an Array (without taking any peaks at the source!). Probably not the hardest thing you'll ever have to do, but there are many ways to do it... Jon and I each take a blind stab at it. Here's our effort:

Mike's variation

def mikes_flatten arr, level = nil
  result = []
  arr.each do |elem|
    recursive_flatten(result, elem,level,0)
  end
  result
end
 
def recursive_flatten(result, elem,level,current_level)
  if elem.is_a?(Array)
    elem.each do |el|
      if level.nil? || current_level < level
        recursive_flatten(result, el,level,current_level+1)
      else
        result << el
      end
    end
  else
    result << elem
  end
  result
end

Jon's variation

def flatten values, level=-1
  flat = []
  values.each do |value|
    if level != 0 && value.kind_of?(Array)
      flat.concat(flatten(value, level-1))
    else
      flat << value
    end
  end
  flat
end

Perf different can be seen by Jon's benchmark code (flatten.rb):

#!/usr/bin/ruby
require 'benchmark'
 
# Insert flatten methods here
 
def random_value
  if rand(3) < 1
    (rand(5)+1).times.map { |i| random_value }
  else
    rand(10000)
  end
end
 
VALUE = 100.times.map { |i| random_value}
ITERATIONS = 1000
 
Benchmark.bm do |b|
  puts "Iterations: #{ITERATIONS}"
  b.report("Mikes version") do
    ITERATIONS.times { |i| mikes_flatten(VALUE)}
  end
  b.report("Jons version") do
    ITERATIONS.times { |i| flatten(VALUE)}
  end
end

Jon's is pretty succinct. When I first went about writing things out I was going for the "get it to work." Either way, I don't see a huge performance difference... They toggle back and forth by about a half a second.

[13:57:11 miker@laughwhat-lm ~/Downloads] $ ruby flatten.rb
      user     system      total        real
Iterations: 10000
Mikes version  2.920000   0.010000   2.930000 (  2.933274)
Jons version  2.990000   0.000000   2.990000 (  3.017929)
 
[13:57:25 miker@laughwhat-lm ~/Downloads] $ ruby flatten.rb
      user     system      total        real
Iterations: 10000
Mikes version  3.010000   0.010000   3.020000 (  3.030987)
Jons version  2.980000   0.010000   2.990000 (  2.999335)
 
[13:57:34 miker@laughwhat-lm ~/Downloads] $ ruby flatten.rb
      user     system      total        real
Iterations: 100000
Mikes version 26.860000   0.050000  26.910000 ( 26.973855)
Jons version 24.050000   0.050000  24.100000 ( 24.126557)
 
[13:58:34 miker@laughwhat-lm ~/Downloads] $ ruby flatten.rb
      user     system      total        real
Iterations: 100000
Mikes version 24.120000   0.050000  24.170000 ( 24.201227)
Jons version 25.080000   0.050000  25.130000 ( 25.196811)

Write your own and submit as a comment. Let's see some other ways to do this :)

Filed under: Benchmark, Code, Ruby 5 Comments
19Oct/100

PHP array_diff vs foreach: a battle for speed

1000 runs w/ 1000 data elements in the two arrays (php array diff): 2.7389E-5 seconds
1000 runs w/ 1000 data elements in the two arrays (php foreach): 1.085E-7 seconds
php array diff slower by 2.728E-5 seconds

There were two arrays for this test: $big_set, which had 3147 string elements and $to_diff, which had 1581 string elements. We needed the difference of the two arrays and found that array_diff is slower on average. The nicety to using a foreach(){} is that we can now do other computation within the loop, as opposed to having to loop a second time after the differential set is found. Code below...

<?php
print_r($argv);
$runs = $argv[2];
echo "$iterations\n";
$data_size = $argv[4];
echo $data_size;
$big_set = array();
for($i = 1; $i < $data_size; $i++){
  $big_set[] = "Hello there ".rand(500,5000);
}
$to_diff = array();
for($i = 1; $i < $data_size; $i++){
  $to_diff[] = "Hello there ".rand(500,5000);
}
 
for ($i=0; $i<$runs; $i++) {
   $time = 0;
   $start = microtime(true);
   $diff = array_diff ($big_set, $to_diff);
   $t = microtime(true) - $start;
   $time += $t;
}
 
echo $runs, ' runs (php array diff): ', sprintf('%2.9f',$time/$runs), ' secs', BR;
 
for ($i=0; $i<$runs; $i++) {
   $time2 = 0;
   $start = microtime(true);
 
   foreach($big_set as $p) {
        unset($to_diff[0]); // cost of an associative array lookup & delete
   }
   $t = microtime(true) - $start;
   $time2 += $t;
}
 
echo $runs, ' runs (forloop): ', sprintf('%2.9f',$time2/$runs), ' secs <br />';
 
if($time2 > $time) {
    $seconds = $time2 - $time;
    echo "forloop slower by ". $seconds/$runs ." seconds";
}
else {
    $seconds = $time - $time2;
    echo "php array diff slower by ". $seconds/$runs ." seconds";
}

Execute via command line :/>php file.php -runs -data_size <# of items in array>
Must be in that order -- I did not make it handle fancy arguments.

Update: Adding PHP Bench for other cool benchmarks

Tagged as: , , No Comments