How do I convert a Python DataFrame into a NumPy array

ssteele

Below is a snippet that converts data into a NumPy array. It is then converted to a Pandas DataFrame where I intend to process it. I'm attempting to convert it back to a NumPy array. I'm failing at this. Badly.

import pandas as pd
import numpy as np
from pprint import pprint

data = [
    ('2020-11-01 00:00:00', 1.0),
    ('2020-11-02 00:00:00', 2.0)
]
coordinatesType = [('timestamp', 'datetime64[s]'), ('value', '<f8')]

npArray = np.asarray(data, coordinatesType)
df = pd.DataFrame(data = npArray)

# do some pandas processing, then convert back to a numpy array

mutatedNpArray = df.to_numpy(coordinatesType)
pprint(mutatedNpArray)

# don't suply dtype for kicks
pprint(df.to_numpy())

This yields crazytown:

array([[('2020-11-01T00:00:00', 1.6041888e+18),
        ('1970-01-01T00:00:01', 1.0000000e+00)],
       [('2020-11-02T00:00:00', 1.6042752e+18),
        ('1970-01-01T00:00:02', 2.0000000e+00)]],
      dtype=[('timestamp', '<M8[s]'), ('value', '<f8')])
array([[Timestamp('2020-11-01 00:00:00'), 1.0],
       [Timestamp('2020-11-02 00:00:00'), 2.0]], dtype=object)

I realize a DataFrame is really a fancy NumPy array under the hood, but I'm passing back to a function that accepts a simple NumPy array. Clearly I'm not handling dtypes correctly and/or I don't understand the data structure inside my DataFrame. Below is what the function I'm calling expects:

[('2020-11-01T00:00:00', 1.000   ),
 ('2020-11-02T00:00:00', 2.000  )],
 dtype=[('timestamp', '<M8[s]'), ('value', '<f8')])

I'm really lost on how to do this. Or what I should be doing instead.

Help!


As @hpaul suggested, I tried the following:

# ...
df = df.set_index('timestamp')

# do some pandas processing, then convert back to a numpy array

mutatedNpArray = df.to_records(coordinatesType)
# ...

All good!

Cainã Max Couto-Silva

Besides the to_records approach mentioned in comments, you can do:

df.apply(tuple, axis=1).to_numpy(coordinatesType)

Output:

array([('2020-11-01T00:00:00', 1.), ('2020-11-02T00:00:00', 2.)],
      dtype=[('timestamp', '<M8[s]'), ('value', '<f8')])

Considerations:

I believe the issue here is related to the difference between the original array and the dataframe.

The shape your original numpy array is (2,), where each value is a tuple. When creating the dataframe, both df.shape and df.to_numpy() shapes are (2, 2) so that the dtype constructor does not work as expected. When converting rows to tuples into a pd.Series, you get the original shape of (2,).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How do I convert a numpy array into a pandas dataframe?

how do i convert a numpy array to pandas dataframe

How to convert a pandas dataframe column to an image array i.e. a numpy array with shape (n,n) in Python?

How do I convert my 2D numpy array to a pandas dataframe with given categories?

How do I convert this dictionary to a dataframe in python?

How to Convert the Numpy array to a DataFrame?

how do you convert a dataframe into 2d numpy array

How do I map a numpy array and an indices array to a pandas dataframe?

How do I convert a pandas Series or index to a Numpy array?

How do I convert a PNG string into a Numpy array?

How do I convert a numpy array of floats into an image?

How do I convert a numpy array to (and display) an image?

How do I load a caffe model and convert to a numpy array?

How do I convert numpy array to days, hours, mins?

How do I convert a numpy array to a tensorflow tensor?

How do I append a column from a numpy array to a pd dataframe?

How do I efficiently convert pandas dataframe to image array?

Python convert large numpy array to pandas dataframe

How to convert spark sql dataframe to numpy array?

How to convert a pyspark dataframe column to numpy array

How to convert 200 column numpy array to dataframe?

How to convert a pandas dataframe to NumPy array

How to convert the dataframe to array in python?

convert numpy array into dataframe

Convert dataframe to numpy array

How do i convert python list into pandas dataframe with predefined columns

How do I convert a dataframe of strings to csv in python?

How do I convert a dataframe column filled with numbers to strings in python?

How do I convert this to an array?