How to assign scipy.sparse matrix to NumPy array via indexing?

Matthias

When I try to assign a scipy.sparse matrix s (any of the available sparse types) to a NumPy array a like this:

a[:] = s

I get a TypeError:

TypeError: float() argument must be a string or a number

Is there a way to get around this?

I know about the todense() and toarray() methods, but I'd really like to avoid the unnecessary copy and I'd prefer to use the same code for both NumPy arrays and SciPy sparse matrices. For now, I'm not concerned with getting the values from the sparse matrix being inefficient.

Is there probably some kind of wrapper around sparse matrices that works with NumPy indexing assignment?

If not, any advice how I could build such a thing by myself?

Is there a different sparse array library that cooperates with NumPy in this situation?

UPDATE:

I poked around in the NumPy sources and, searching for the error message string, I think I found the section where the indexing assignment happens in numpy/core/src/multiarray/arraytypes.c.src around line 187 in the function @TYPE@_setitem().

I still don't really get it, but at some point, the float() function seems to be called (if a is a floating-point array). So I tried to monkey-patch one of the SciPy sparse matrix classes to allow this function to be called:

import scipy
s = scipy.sparse.dok_matrix((5, 1))
def myfloat(self):
    assert self.shape == (1, 1)
    return self[0, 0]
scipy.sparse.dok.dok_matrix.__float__ = myfloat
a[:] = s

Sadly, this doesn't work because float() is called on the whole sparse matrix and not on the individual items thereof.

So I guess my new question is: how can I further change the sparse matrix class to make NumPy iterate over all the items and call float() on each of them?

ANOTHER UPDATE:

I found a sparse array module on Github (https://github.com/FRidh/sparse), which allows assignment to a NumPy array. Sadly, the features of the module are quite limited (e.g. slicing doesn't really work yet), but it might help to understand how assigning to NumPy arrays can be achieved. I'll investigate that further ...

YET ANOTHER UPDATE:

I did some more digging and found that a more interesting source file is probably numpy/core/src/multiarray/ctors.c. I suspect that the function PySequence_Check() (docs/code) is called sometime during the assignment. The simple sparse array class from https://github.com/FRidh/sparse passes the test, but it looks like the sparse matrix classes from SciPy don't (although in my opinion they are sequences).

They get checked for __array_struct__, __array_interface__ and __array__, and then it's somehow decided that they are not sequences. The attributes __getitem__ and __len__ (which all the sparse array classes have!) are not checked.

This leads me to yet another question: How can I manipulate the sparse matrix classes (or objects thereof) in a way that they pass PySequence_Check()?

I think as soon as they are recognized as sequences, assignment should work, because __getitem__() and __len__() should be sufficient for that.

Matthias

As mentioned in a comment to my question, the sequence interface won't work for sparse matrices, because they don't lose a dimension when indexed with a single number. To try it anyway, I created a very limited quick-and-dirty sparse array class in pure Python, which, when indexed with a single number, returns a "row" class (which holds a view to the original data), which again can be indexed with a single number to yield the actual value at this index. Using an instance s of my class, assigning to a NumPy array a works exactly as requested:

a[:] = s

I expected this to be somewhat inefficient, but it is really, really, really, extremely slow. Assigning a 500.000 x 100 sparse array took several minutes! The good news, though, is that no full-sized temporary array is created during the assignment. The memory usage stays about constant during the assignment (while one of the CPUs maxes out).

So this is basically one solution to the original question.

To make the assignment more efficient and still use no temporary copy of the dense array data, NumPy would have to internally do something similar to

s.toarray(out=a)

As far as I know, there is currently no way to get NumPy to do that.

However, there is a way to do something very similar, by providing an __array__() method that returns a NumPy array. Incidentally, SciPy sparse matrices already have such a method, just with a different name: toarray(). So I just renamed it:

scipy.sparse.dok_matrix.__array__ = scipy.sparse.dok_matrix.toarray
a[:] = s

This works like a charm (also with the other sparse matrix classes) and is totally fast!

According to my limited understanding of the situation, this should create a temporary NumPy array with the same size as a which holds all the values from s (and many zeros) and which is then assigned to a. But strangely, even when I use a very large a that occupies nearly all my available RAM, the assignment still happens very quickly and no additional RAM is used.

So I guess this is another, much better solution to my original question.

Which leaves another question: why does this work without a temporary array?

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to assign to this sparse matrix with numpy indexing tricks instead of using loops?

How to transform numpy.matrix or array to scipy sparse matrix

How to column_stack a numpy array with a scipy sparse matrix?

Assign a the value of a sparse matrix to numpy array

How to slice a scipy sparse matrix and keep the original indexing?

Matrix (scipy sparse) - Matrix (dense; numpy array) multiplication efficiency

How do I transform a "SciPy sparse matrix" to a "NumPy matrix"?

How to assign values to slices in Scipy sparse huge matrix

How to matrix-multiply two sparse SciPy matrices and produce a dense Numpy array efficiently?

Transform scipy sparse matrix to index-based numpy array

How to covert a large (10^6 * 10^6) Numpy sparse matrix to a Scipy sparse matrix?

Why doesn't scipy.sparse.csc_matrix preserve the indexing order of my np.array?

Numpy array indexing with another matrix

Scipy: Sparse indicator matrix from array(s)

Scipy Sparse: Singular Matrix Warning after SciPy/NumPy Update

How to convert a numpy array dtype=object to a sparse matrix?

Matrix multiplication of a sparse SciPy matrix with two NumPy vectors

NumPy matrix to SciPy sparse matrix: What is the safest way to add a scalar?

Matlab Indexing Sparse Matrix

How to iterate over a row in a SciPy sparse matrix?

Using a sparse matrix versus numpy array

Boolean index Numpy array with sparse matrix

Convert numpy object array to sparse matrix

`np.concatenate` a numpy array with a sparse matrix

sparse matrix calculations in scipy

groupby on sparse matrix with scipy

Logical not on a scipy sparse matrix

Scipy sparse matrix multiplication

Scipy: Sparse Matrix to ndarray