In GHC- Why is the lazy version of this small program so much faster than the loop based variant?

jamshidh Published at Dev

jamshidh

These two programs do the same thing, but one runs 10x faster.

This takes approx. 10 seconds on my machine:

import Control.Monad
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as BL

theValueOne=B.singleton 1

main = replicateM_ 100000000 $ B.putStr theValueOne

The second version uses output-lazy IO. It is done in about 1 second (as fast as c):

import qualified Data.ByteString.Lazy as BL

main = BL.putStr $ BL.pack $ replicate 100000000 1

Question: Why is the non-lazy version so slow? More importantly, how can I make it fast? (I've tried recursion, forM, modifying the output buffer using hSetBuffering... Nothing has made a difference)

Note- This is more than just an academic question. The non-lazy version is an extremely simplified version of an executable my company uses in production, which is also slow in the same way. It would be nearly impossible to re-architect the larger program around the analogous lazy solution.

K. A. Buhr

Updated: Added possible source of problem and a solution.

I don't think it has anything to do with lazy I/O. If you rewrite the strict I/O version to write two bytes at once:

theValueOne = B.singleton 1
main = replicateM_ 50000000 $ B.putStr (theValueOne <> theValueOne)

that halves the time. Write ten bytes at once:

theValueOne = B.singleton 1
main = replicateM_ 10000000 $ B.putStr (foldMap id (replicate 10 theValueOne))

and it's already faster than the lazy I/O version.

The issue is that there's a bit of overhead in a B.hPutStr call, much more than the overhead of a C fwrite call, and it's just not a particularly efficient way to write a single byte.

A good chunk of the overhead comes from the fact that Haskell I/O buffers have immutable metadata. Even though the buffer content itself is mutable, the pointers to valid data within the buffer are immutable, and so writing a single byte requires a heap allocation of a new GHC.IO.Buffer.Buffer structure which GHC can't optimize away

One solution is to use a hand-crafted buffering structure with a mutable pointer. The following works, and it's about twice as fast as the lazy I/O version in the original question.

{-# LANGUAGE RecordWildCards #-}
{-# OPTIONS_GHC -Wall #-}

import Control.Monad
import Data.IORef
import Data.Word
import Foreign.ForeignPtr
import Foreign.Ptr
import Foreign.Storable
import System.IO

data WriteBuffer = WriteBuffer
  { handle :: !Handle
  , capacity :: !Int
  , used :: !(IORef Int)
  , content :: !(ForeignPtr Word8)
  }

newBuffer :: Handle -> IO WriteBuffer
newBuffer h = do
  hSetBinaryMode h True
  hSetBuffering h NoBuffering
  WriteBuffer h cap <$> newIORef 0 <*> mallocForeignPtrBytes cap
  where cap = 4096

flushBuffer :: WriteBuffer -> IO ()
flushBuffer WriteBuffer{..} = do
  n <- readIORef used
  withForeignPtr content $ \p -> hPutBuf handle p n
  writeIORef used 0

writeByte :: Word8 -> WriteBuffer -> IO ()
writeByte w buf@(WriteBuffer{..}) = do
  n <- readIORef used
  withForeignPtr content $ \p -> poke (plusPtr p n) w
  let n' = n + 1
  writeIORef used n'
  when (n' == capacity) $
    flushBuffer buf

main :: IO ()
main = do
  b <- newBuffer stdout
  replicateM_ 100000000 (writeByte 1 b)
  flushBuffer b

Someone ironically, converting this to a version using an immutable counter and passing the WriteBuffer as state through foldM doubles the speed again, so it's about 4 times as fast as the lazy I/O version in the original question:

{-# LANGUAGE RecordWildCards #-}
{-# OPTIONS_GHC -Wall #-}

import Control.Monad
import Data.Word
import Foreign.ForeignPtr
import Foreign.Ptr
import Foreign.Storable
import System.IO

data WriteBuffer = WriteBuffer
  { handle :: !Handle
  , capacity :: !Int
  , used :: !Int
  , content :: !(ForeignPtr Word8)
  }

newBuffer :: Handle -> IO WriteBuffer
newBuffer h = do
  hSetBinaryMode h True
  hSetBuffering h NoBuffering
  WriteBuffer h cap 0 <$> mallocForeignPtrBytes cap
  where cap = 4096

flushBuffer :: WriteBuffer -> IO WriteBuffer
flushBuffer buf@WriteBuffer{..} = do
  withForeignPtr content $ \p -> hPutBuf handle p used
  return $ buf { used = 0 }

writeByte :: Word8 -> WriteBuffer -> IO WriteBuffer
writeByte w buf@(WriteBuffer{..}) = do
  withForeignPtr content $ \p -> poke (plusPtr p used) w
  let used' = used + 1
      buf' = buf { used = used' }
  if (used' == capacity)
    then flushBuffer buf'
    else return buf'

main :: IO ()
main = do
  b <- newBuffer stdout
  b' <- foldM (\s _ -> writeByte 1 s) b [(1::Int)..100000000]
  void (flushBuffer b')

The reason this one is so fast seems to be that GHC is able to optimize away the WriteBuffer constructor entirely from the fold and just pass around unboxed pointers and integers in the loop. My guess is that if I modified the mutable version above to avoid boxing and unboxing the integer in the used IORef, it would be similarly fast.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-5

Comments

0 comments

TOP Ranking

Article

In GHC- Why is the lazy version of this small program so much faster than the loop based variant?

In GHC- Why is the lazy version of this small program so much faster than the loop based variant?

pump.io port in URL

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Using Response.Redirect with Friendly URLS in ASP.NET

Can a 32-bit antivirus program protect you from 64-bit threats

Double spacing in rmarkdown pdf

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

3D Touch Peek Swipe Like Mail

Bootstrap 5 Static Modal Still Closes when I Click Outside

Assembly definition can't resolve namespaces from external packages

Vector input in shiny R and then use it

Emulator wrong screen resolution in Android Studio 1.3

Svchost high CPU from Microsoft.BingWeather app errors

Graphics Context misaligned on first paint

Python connect to firebird docker database

Is this docker-for-mac password dialog legit?

How to save models trained locally in Amazon SageMaker?