How to do memory mapped IO on custom data types?

fho

The Setup

I recently implemented mmap based file reading and directly ran into strange behavior. The relevant code is:

-- | map whole aedat file into memory and return it as a vector of events
-- TODO what are the finalizing semantics of this?
mmapAERData :: S.Storable a => FilePath -> IO (S.Vector (AER.Event a))
mmapAERData name = do
    -- mmap file into memory and find the offset behind the header
    bs <- dropHeader <$> mmapFileByteString name Nothing
    -- some conversion is necessary to get the 'ForeignPtr' from
    -- a 'ByteString'
    B.unsafeUseAsCString bs $ \ptr -> do
      fptr <- newForeignPtr_ ptr
      let count = B.length bs `div` 8 -- sizeof one event
      return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count

→ code in context

Some explanation: The AEDat Format is basically a long list of two Word32s. One encodes the address the other the timestamp. Before that there are some lines of header text that I drop in the dropHeader function. I could do this directly on a ForeignPtr if absolutely necessary, but I prefer to use the common function that works on ByteStrings instead.

The the Storable instances can be found here and here. I am not sure about the alignment here, but I suspect that an alignment of 8 should be correct.

The Problem

Reading the data works quite well, but after some time the memory seems to get corrupted somehow:

>>> es <- DVS.mmapDVSData "dataset.aedat" 
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 6, posY = 50}, timestamp = 74.771407s}
>>> :type es
es :: S.Vector (DVS.Event DVS.Address)
>>> _ <- evaluate (V.convert es :: V.Vector (DVS.Event DVS.Address))
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 0, posY = 44}, timestamp = 0s}

Apparently accessing all elements of es somehow corrupts my memory. Or the garbage collector recycles it? Either way, this is strange. What can I do about that?

NovaDenizen

mmapFileByteString performs a mmap, which creates a ForeignPtr, and sticks that ForeignPtr into a ByteString. unsafeUseAsCString coerces the ForeignPtr into a Ptr, from which you then create a new ForeignPtr. Then you take that second ForeignPtr and use it with S.unsafeFromForeignPtr0 to create a vector.

Having two ForeignPtrs pointing at the same memory is a no no. The GHC runtime treats them as two separate objects. After all references to the ByteString are gone, the finalizer for itsForeignPtr will be called, deallocating the mmap and reclaiming the underlying memory. This leaves the second ForeignPtr pointing at an invalid region.

The solution here is to use Data.ByteString.Internal.toForeignPtr to extract and re-use the ForeignPtr from the ByteString. Replace the unsafeUseAsCString block with this:

let (fptr,offset,len) = Data.ByteString.Internal.toForeignPtr bs
-- it might be worthwhile to assert that offset == 0
let count = len `div` 8
return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count

IMHO, the real solution here is not to fiddle with all this stuff at all. Just conventionally read the file into a ByteString, pull out 8-byte substrings from that and manually conver them into Events. All this mmap and ForeignPtr stuff is dangerous, and not a whole lot faster than doing things safely and correctly. If you want absolute fastest performance without regard to safety, program in C.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

What is the difference between DMA and memory-mapped IO?

How do I write to a memory-mapped address in Rust?

How can preserve the types for a mapped object in Typescript

How do make an object (with custom data types) serializable in VB.net such that it can be converted to JSON?

Clarification on how virtual memory manager in Windows fetches memory mapped file data

Memory mapped IO - how does IO device know value has changed?

Do general purpose registers are generally memory mapped?

How memory is mapped to gpu (opencl Intel graphics)

What is the benefit of calling ioread functions when using memory mapped IO

How can a data with high dimensionality be mapped to eigen types?

How to improve GEMM performance on data-mapped (Eigen::Map) matrices sharing memory with an std::vector?

Building boost without memory-mapped IO

How is integer literal mapped in memory?

Custom Mapped Column Types In Slick Table Definition

How to do Binary Search over memory mapped compressed file in java?

Why do different data types take more memory for the same data?

How do I communicate between a server and a client on the same computer using memory mapped files?

Intermediate types from a custom mapped type

How to replace properties using mapped types in Typescript

Custom Types in OCaml and Memory

Typescript - How to use nested mapped types

Why do we need memory mapped IO?

What exactly is memory mapped io and port based io

TypeScript: how to map interface to custom data types

How to correctly use mapped types

How do I add an 'onClick' event to a mapped data?

How do I use mapped types to ensure all nested children are represented?

How to use Mapped Types with React.useState?

How to dynamically allocate structure memory and add data of different data types into it?