How to assign scipy.sparse matrix to NumPy array via indexing?

Matthias Published at Dev

Matthias

When I try to assign a scipy.sparse matrix s (any of the available sparse types) to a NumPy array a like this:

a[:] = s

I get a TypeError:

TypeError: float() argument must be a string or a number

Is there a way to get around this?

I know about the todense() and toarray() methods, but I'd really like to avoid the unnecessary copy and I'd prefer to use the same code for both NumPy arrays and SciPy sparse matrices. For now, I'm not concerned with getting the values from the sparse matrix being inefficient.

Is there probably some kind of wrapper around sparse matrices that works with NumPy indexing assignment?

If not, any advice how I could build such a thing by myself?

Is there a different sparse array library that cooperates with NumPy in this situation?

UPDATE:

I poked around in the NumPy sources and, searching for the error message string, I think I found the section where the indexing assignment happens in numpy/core/src/multiarray/arraytypes.c.src around line 187 in the function @TYPE@_setitem().

I still don't really get it, but at some point, the float() function seems to be called (if a is a floating-point array). So I tried to monkey-patch one of the SciPy sparse matrix classes to allow this function to be called:

import scipy
s = scipy.sparse.dok_matrix((5, 1))
def myfloat(self):
    assert self.shape == (1, 1)
    return self[0, 0]
scipy.sparse.dok.dok_matrix.__float__ = myfloat
a[:] = s

Sadly, this doesn't work because float() is called on the whole sparse matrix and not on the individual items thereof.

So I guess my new question is: how can I further change the sparse matrix class to make NumPy iterate over all the items and call float() on each of them?

ANOTHER UPDATE:

I found a sparse array module on Github (https://github.com/FRidh/sparse), which allows assignment to a NumPy array. Sadly, the features of the module are quite limited (e.g. slicing doesn't really work yet), but it might help to understand how assigning to NumPy arrays can be achieved. I'll investigate that further ...

YET ANOTHER UPDATE:

I did some more digging and found that a more interesting source file is probably numpy/core/src/multiarray/ctors.c. I suspect that the function PySequence_Check() (docs/code) is called sometime during the assignment. The simple sparse array class from https://github.com/FRidh/sparse passes the test, but it looks like the sparse matrix classes from SciPy don't (although in my opinion they are sequences).

They get checked for __array_struct__, __array_interface__ and __array__, and then it's somehow decided that they are not sequences. The attributes __getitem__ and __len__ (which all the sparse array classes have!) are not checked.

This leads me to yet another question: How can I manipulate the sparse matrix classes (or objects thereof) in a way that they pass PySequence_Check()?

I think as soon as they are recognized as sequences, assignment should work, because __getitem__() and __len__() should be sufficient for that.

Matthias

As mentioned in a comment to my question, the sequence interface won't work for sparse matrices, because they don't lose a dimension when indexed with a single number. To try it anyway, I created a very limited quick-and-dirty sparse array class in pure Python, which, when indexed with a single number, returns a "row" class (which holds a view to the original data), which again can be indexed with a single number to yield the actual value at this index. Using an instance s of my class, assigning to a NumPy array a works exactly as requested:

a[:] = s

I expected this to be somewhat inefficient, but it is really, really, really, extremely slow. Assigning a 500.000 x 100 sparse array took several minutes! The good news, though, is that no full-sized temporary array is created during the assignment. The memory usage stays about constant during the assignment (while one of the CPUs maxes out).

So this is basically one solution to the original question.

To make the assignment more efficient and still use no temporary copy of the dense array data, NumPy would have to internally do something similar to

s.toarray(out=a)

As far as I know, there is currently no way to get NumPy to do that.

However, there is a way to do something very similar, by providing an __array__() method that returns a NumPy array. Incidentally, SciPy sparse matrices already have such a method, just with a different name: toarray(). So I just renamed it:

scipy.sparse.dok_matrix.__array__ = scipy.sparse.dok_matrix.toarray
a[:] = s

This works like a charm (also with the other sparse matrix classes) and is totally fast!

According to my limited understanding of the situation, this should create a temporary NumPy array with the same size as a which holds all the values from s (and many zeros) and which is then assigned to a. But strangely, even when I use a very large a that occupies nearly all my available RAM, the assignment still happens very quickly and no additional RAM is used.

So I guess this is another, much better solution to my original question.

Which leaves another question: why does this work without a temporary array?

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-03-23

Comments

0 comments

TOP Ranking

Article

How to assign scipy.sparse matrix to NumPy array via indexing?

How to assign scipy.sparse matrix to NumPy array via indexing?

pump.io port in URL

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Using Response.Redirect with Friendly URLS in ASP.NET

Can a 32-bit antivirus program protect you from 64-bit threats

Double spacing in rmarkdown pdf

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

3D Touch Peek Swipe Like Mail

Bootstrap 5 Static Modal Still Closes when I Click Outside

Assembly definition can't resolve namespaces from external packages

Vector input in shiny R and then use it

Emulator wrong screen resolution in Android Studio 1.3

Svchost high CPU from Microsoft.BingWeather app errors

Graphics Context misaligned on first paint

Python connect to firebird docker database

Is this docker-for-mac password dialog legit?

How to save models trained locally in Amazon SageMaker?