Dynamic Programming - Fibonacci - Performance of List better than Array

Sergio Daniel Coronel Malvarez

I'm playing a little with Haskell and dynamic programming.

I have implemented a lot of problems, but in Fibonacci's case i'm getting some results that are PC dependants and i would like to confirm.

Assume the following implementations:

1)- List:

memoized_fib_list n = fibTab !! n
   where fibTab = map fibm [0 ..]
         fibm 0 = 0
         fibm 1 = 1
         fibm n = fibTab !! (n-2) + fibTab !! (n-1)

2)- Array:

memoized_fib_array n = fibTab ! n
  where fibTab = listArray (0, n) [mfib x | x <- [0..n]]
        mfib 0 = 0
        mfib 1 = 1
        mfib x = fibTab ! (x - 1) + fibTab ! (x - 2)

Result (with Criterion):

  1. N = 15.000:
    List implementation: 171.5 μs
    Array implementation: 8.782 ms

  2. N = 100.000:
    List implementation: 2.289 ms
    Array implementation: 195.7 ms

  3. N = 130.000:
    List implementation: 3.708 ms
    Array implementation: 410.4 ms

The tests were run on a Notebook with a Core i7 Skylake, 8gb DDR4 and SSD (Ubuntu).

I was expecting the array implementation to be much better, and this was the only problem where the list implementation is better.

Could it be because of the sequential access? On some hardware with lower specs the list implementation has worse performance.

Note: I'm using the last (edit: latest) version of GHC.

Thanks.


Edit:

benchmark n = defaultMain [
    bgroup "fibonacci" [ 
                   bench "memoized_fib_list" $ whnf (memoized_fib_list) n
                 , bench "memoized_fib_array" $ whnf (memoized_fib_array) n
                 ]
                ]

main = do
{
    putStrLn "--------------EJECUTANDO BENCHMARK N=40------------------";
    benchmark 40;
    putStrLn "--------------EJECUTANDO BENCHMARK N=15000---------------";
    benchmark 15000;
    putStrLn "--------------EJECUTANDO BENCHMARK N=50000---------------";
    benchmark 50000;
    putStrLn "--------------EJECUTANDO BENCHMARK N=100000--------------";
    benchmark 100000;
    putStrLn "--------------EJECUTANDO BENCHMARK N=130000--------------";
    benchmark 130000;
}

Edit2: I installed Haskell Platform 8.2.2 on my windows 10 PC and got very similar results.

Intel i5 6600K, 16gb DDR4, SSD.

-------------------EJECUTANDO BENCHMARK N=130000------------------------
benchmarking best algo/memoized_fib_list
time                 1.818 ms   (1.774 ms .. 1.855 ms)
                     0.993 R²   (0.985 R² .. 0.998 R²)
mean                 1.853 ms   (1.826 ms .. 1.904 ms)
std dev              119.2 μs   (84.15 μs .. 191.3 μs)
variance introduced by outliers: 48% (moderately inflated)

benchmarking best algo/memoized_fib_array
time                 139.8 ms   (63.05 ms .. 221.8 ms)
                     0.884 R²   (0.623 R² .. 1.000 R²)
mean                 287.0 ms   (221.4 ms .. 353.0 ms)
std dev              83.83 ms   (64.91 ms .. 101.6 ms)
variance introduced by outliers: 78% (severely inflated)

Edit3: Some additional information after running criterion with Linear Regression. All the values correspond to the execution with N = 130000.

-Number of garbage collections:

List implementation:

numGcs:              NaN R²     (NaN R² .. NaN R²)
  iters              0.000      (0.000 .. 0.000)
  y                  0.000      (0.000 .. 0.000)

Array Implementation:

numGcs:              1.000 R²   (1.000 R² .. 1.000 R²)
  iters              739.000    (739.000 .. 739.000)
  y                  2.040e-12  (-3.841e-12 .. 2.130e-12)

-Bytes allocated:

List implementation:

allocated:           0.001 R²   (0.000 R² .. 0.089 R²)
  iters              1.285      (-9.751 .. 13.730)
  y                  2344.014   (1748.809 .. 2995.439)

Array Implementation:

allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
  iters              7.586e8    (7.586e8 .. 7.586e8)
  y                  1648.000   (1648.000 .. NaN)

-CPU cycles:

List implementation:

cycles:              0.992 R²   (0.984 R² .. 0.997 R²)
  iters              6759303.406 (6579945.392 .. 6962148.091)
  y                  -141047.582 (-4701325.840 .. 4674847.149)

Array Implementation:

cycles:              1.000 R²   (NaN R² .. 1.000 R²)
  iters              1.729e9    (1.680e9 .. 1.757e9)
  y                  -3311041.000 (NaN .. 6.513e7)
leftaroundabout

What's happening here is quite simple: with -O2, GHC decides to make the memoisation-list in memoized_fib_list global.

$ ghc -fforce-recomp wtmpf-file4545.hs -O2 -ddump-prep

...

Main.memoizedFib_list :: GHC.Types.Int -> GHC.Integer.Type.Integer
[GblId, Arity=1, Str=<S(S),1*U(U)>, Unf=OtherCon []]
Main.memoizedFib_list
  = \ (n_sc61 [Occ=Once] :: GHC.Types.Int) ->
      GHC.List.!! @ GHC.Integer.Type.Integer Main.main_fibTab n_sc61

...

Main.main_fibTab :: [GHC.Integer.Type.Integer]
[GblId]
Main.main_fibTab
  = case Main.$wgo 0# of
    { (# ww1_sc5M [Occ=Once], ww2_sc5N [Occ=Once] #) ->
    GHC.Types.: @ GHC.Integer.Type.Integer ww1_sc5M ww2_sc5N
    }

...

That means, your criterion benchmark doesn't actually evaluate the fibonacci function repeatedly – it just performs repeated lookups in the same global list. And averaged over many evaluations, this gives a very good score, which is however not representative of how fast the calculation is.

GHC performs this optimisation in the list implementation because you don't need lists of different length – it's always an infinite list of all Fibonacci number. That's not possible in the array implementation, so this can't keep up here.

The simples way to prevent this globalisation would be to make fibm explicitly dependent on n, by just trimming it to the needed finite lenght like the arrays are as well.

memoizedFib_list :: Int -> Integer
memoizedFib_list n = fibTab !! n
   where fibTab = map fibm [0 ..]
         fibm 0 = 0
         fibm 1 = 1
         fibm n = fibTab !! (n-2) + fibTab !! (n-1)

With this, the list implementation becomes much slower than the array one, as one would expect seeing memo-lookup is O(n) for lists:

$ ghc -fforce-recomp wtmpf-file4545.hs -O2 && ./wtmpf-file4545 
[1 of 1] Compiling Main             ( wtmpf-file4545.hs, wtmpf-file4545.o )
Linking wtmpf-file4545 ...
--------------EJECUTANDO BENCHMARK N=40------------------
benchmarking fibonacci/memoizedFib_list
time                 10.47 μs   (10.42 μs .. 10.51 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 10.40 μs   (10.35 μs .. 10.44 μs)
std dev              163.3 ns   (122.2 ns .. 225.8 ns)
variance introduced by outliers: 13% (moderately inflated)

benchmarking fibonacci/memoizedFib_array
time                 1.618 μs   (1.617 μs .. 1.620 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.620 μs   (1.618 μs .. 1.623 μs)
std dev              7.521 ns   (4.079 ns .. 12.48 ns)

benchmarking fibonacci/memoizedFib_vector
time                 1.573 μs   (1.572 μs .. 1.574 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.572 μs   (1.571 μs .. 1.573 μs)
std dev              2.351 ns   (1.417 ns .. 4.040 ns)

--------------EJECUTANDO BENCHMARK N=1500----------------
benchmarking fibonacci/memoizedFib_list
time                 18.52 ms   (18.41 ms .. 18.68 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 18.65 ms   (18.53 ms .. 18.84 ms)
std dev              355.1 μs   (204.8 μs .. 592.1 μs)

benchmarking fibonacci/memoizedFib_array
time                 135.2 μs   (131.2 μs .. 140.1 μs)
                     0.996 R²   (0.991 R² .. 1.000 R²)
mean                 132.7 μs   (131.9 μs .. 135.0 μs)
std dev              4.463 μs   (2.024 μs .. 8.327 μs)
variance introduced by outliers: 32% (moderately inflated)

benchmarking fibonacci/memoizedFib_vector
time                 131.8 μs   (130.6 μs .. 133.2 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 132.5 μs   (131.4 μs .. 134.1 μs)
std dev              4.383 μs   (3.463 μs .. 5.952 μs)
variance introduced by outliers: 31% (moderately inflated)

Vector which I also tested here performs yet a bit faster, but not really significantly. I think as soon as you use a container with O(1) lookup, the performance is dominated by the additions of the pretty huge numbers, so you're really benchmarking GMP rather than anything Haskell has to do with.

import qualified Data.Vector as V

memoizedFib_vector :: Int -> Integer
memoizedFib_vector n = fibTab V.! n
  where fibTab = V.generate (n+1) mfib
        mfib 0 = 0
        mfib 1 = 1
        mfib x = fibTab V.! (x - 1) + fibTab V.! (x - 2)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Why is one of these dynamic programming implementations of the Fibonacci sequence faster than the other?

Dynamic Programming Fibonacci Number

Dynamic programming Fibonacci Swift

Dynamic Programming Fibonacci Sequence

Python list performs better than numpy array?

Is performance of List.Sort() by Comparison is better than custom IComparer?

Is there a way to optimize my list comprehension for better performance? It is slower than a for loop

Fibonacci Series using Dynamic Programming

Is !(~A && ~B) better than (A||B) in programming?

Fibonacci in Java using Dynamic Programming and Recursions

nth fibonacci number using using dynamic programming

Javascript closure in fibonacci series using dynamic programming

Python dynamic programming performance difference

Haskell performance using dynamic programming

Why threads are showing better performance than coroutines?

Java ConcurrentHashMap is better than HashMap performance wise?

Is this and prototypical inheritance better for performance than direct references?

Why does C++ multiplication with dynamic array work better than std::vector version

Golang - Slice Performance append has better performance than assign

Multilayered-LIST-to-ARRAY-transformation in R and any better way than using loops

Is their a better way to create a SwiftUI list looping through an array than looping for each VStack

more columns or one array, what is better performance?

Which one is better in performance in accessing length of array

Cache variables stored in array = better performance of retrieval?

Array partition using dynamic programming

Array of buckets with dynamic programming problem

Dynamic Programming - minimize the sum of the array

When is binary tree better than ordered list?

Functional programming: Self-referencing list in Java (Fibonacci)