4 dimensional array C++

Ritwin Narra

I'm using a 4d vector, but the initialization is very long using vectors:

vec = std::vector<std::vector<std::vector<std::vector<T>>>>(A, std::vector<std::vector<std::vector<T>>>(B, std::vector<std::vector<T>>(C, std::vector<T>(D, def))));

Is there any way I can shorten this easily? I don't need to use a vector, any 4D array will do, as long as I can declare it in one place and initialize it in another. I also don't need to extend it, but I don't know the dimensions beforehand.

I could use typedefs to have something like vec1d<T> = std::vector<T>, vec2d<T> = std::vector<vec1d<T>> = std::vector<std::vector<T>>, etc., but that seems unnecessary. I think something can be done with variadic templates (but I don't know them well enough :P).

I can of course create another class, but if there's a better way, I'd rather use that.

Kuba hasn't forgotten Monica

A vector of vectors even 2 levels deep is generally a bad idea because of poor cache locality and the latency of the intermediate pointer dereferences needed to get to the last layer of the onion.

So, most generally: you want a single vector only. No need to allocate multiple times. The dimensions are fixed, after all, so one allocation will do just fine.

The next concern is the best scheme of arranging data within it for cache-friendliness. A simple "row"-major order (i.e. adjacent elements are in the highest dimension) may be a good start - as shown in the trivial example below.

If there are any access patterns that read consecutive elements along any dimension but last, there'd need to be a better indexing scheme that would ensure, for example, that the adjacent elements in all four dimensions are kept "together": [i,0,0,0], [i,0,0,1], [i,0,1,0], [i,0,1,1], [i,1,0,0], [i,1,0,1], [i,1,1,0], [i,1,1,1], [i+1,0,0,0], and so on.

Cache is not everything, of course. The prefetcher on modern CPUs is able to figure out most access patterns, but the cache misses will still limit bandwidth, since in the worst case each element access will need to fetch an entire cache line, with most of the line being of no use before it gets evicted. That may or may not be a problem. If the elements are a cacheline in size (64 bytes including alignment padding), that's not a concern. Then, if the array is large (spanning many memory pages), you'd be incurring page table TLB misses, and page misses in general.

If the elements are small, it may make sense to use compiler intrinsics to leverage the vector operations of the CPU, and e.g. operate on 512 bits of data at the same time (perfect for a 2x2x2x2 cube of floats or ints).

There are many better arrangements available and you might chose one based on the statistical properties of the access patterns and/or on the algorithms used to process the data. Say, multi-dimensional FFT can do well with a tailored indexing scheme.

The idea is to get as many consecutively accessed elements into the cache line as possible. If the array is small but accessed consecutively along different dimensions all the time, it may make sense to keep 4 copies, each with adjacent elements indexed by a different dimension. If it's even smaller - so that it only partly fills the L1 D-cache (say <1kb in size in total) - such copies would be detrimental. Only you'd know exactly what's needed here, though. Please add that information to the question :)

#include <array>
#include <cassert>
#include <cstddef>
#include <type_traits>
#include <vector>

template <class T>
class array_4d {
    std::vector<T> _data;
    const std::array<size_t, 4> _n;
    const std::array<size_t, 4> _nn;
    using const_value_type =
        std::conditional_t<std::is_scalar_v<T>, const T, const T &>;
public:
    using value_type = T;
    using reference = T&;
    array_4d(size_t n1, size_t n2, size_t n3, size_t n4, T &&_default= {}) :
        _data(n1*n2*n3*n4, std::move(_default)), 
        _n{n1, n2, n3, n4},
        _nn{n2*n3*n4, n3*n4, n4, 1}
    {}
    const_value_type operator()(size_t i, size_t j, size_t k, size_t l) const {
        return _data[_nn[0]*i + _nn[1]*j + _nn[2]*k + _nn[3]*l];
    }
    reference operator() (size_t i, size_t j, size_t k, size_t l) noexcept {
        return _data[_nn[0]*i + _nn[1]*j + _nn[2]*k + _nn[3]*l];
    }
    size_t size() const noexcept { return _data.size(); }
    size_t size(int dim) const noexcept { return _n[dim]; }  

    struct _jkl {
        T* const arr; size_t const nj, nk;
        struct _kl {
            T* const arr; size_t const nk;
            struct _l {
                T* const arr;
                reference operator[](size_t l) noexcept { return arr[l]; }
            };
            _l operator[](size_t k) noexcept { return { arr + nk*k }; }
        };
        _kl operator[](size_t j) noexcept { return { arr + nj*j, nk }; }
    };
    _jkl operator[](size_t i) { return { &_data[_nn[0]*i], _nn[1], _nn[2] }; }
};

int main()
{
    // usage example
    array_4d<int> A(4,4,4,4);
    A(0,1,2,3) = 42;
    assert(A[0][1][2][3] == 42);
}

You can play with this code on godbolt: https://godbolt.org/z/joG6x15r9

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related