Improve speed of pandas boolean indexing

jim basquiat

Using the boolean indexing with a sample data worked fine, but as I increased the size of the data, the computing time is getting exponentially long (example below). Anyone knows a way to increase the speed of that particular boolean indexer ?

import pandas as pd
import numpy as np
a = pd.date_range('2019-01-01', '2019-12-31',freq = '1T')
b = np.random.normal(size = len(a), loc = 50)
c = pd.DataFrame(index = a, data = b, columns = ['price'])

1500 rows:

z = c.head(1500)
z[z.index.map(lambda x : 8 <= x.hour <= 16 ) & z.index.map(lambda x : x.weekday() < 5 )]

CPU times: user 149 ms, sys: 8.71 ms, total: 158 ms Wall time: 157 ms

5000 rows:

z = c.head(5000)
z[z.index.map(lambda x : 8 <= x.hour <= 16 ) & z.index.map(lambda x : x.weekday() < 5 )]

CPU times: user 14.1 s, sys: 9.07 s, total: 23.2 s Wall time: 23.2 s

I tried with z = c.head(10000) but it's taking more than 15 minutes to comput so i stopped... The size of the data I want to use that indexer on is about 30000 rows.

Willem Van Onsem

The reason this does not work fast is because you perform a mapping with a lambda expression, so that means that for each item, a function call will be made. This is typically not a good idea if you want to process data in "bulk". You can speed this up with:

hour = z.index.hour
z[(8 <= hour) & (hour <= 16) & (z.index.weekday < 5)]

With z = c (so a total of 524'161 rows), we get the following timings:

>>> z = c
>>> timeit(lambda: z[(8 <= z.index.hour) & (z.index.hour <= 16) & (z.index.weekday < 5)], number=100)
11.825318349001464

So this runs in a total of ~118 milliseconds per run.

When we use the first 5'000 rows, we get:

>>> z = c.head(5000)
>>> timeit(lambda: z[(8 <= z.index.hour) & (z.index.hour <= 16) & (z.index.weekday < 5)], number=100)
0.1542488380218856

So this runs in 1.5 milliseconds per run.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-12

Comments

0 comments

TOP Ranking

Article

Improve speed of pandas boolean indexing

Improve speed of pandas boolean indexing

pump.io port in URL

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

How to import an asset in swift using Bundle.main.path() in a react-native native module

Inner Loop design for webscrapping

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

Removed zsh, but forgot to change shell back to bash, and now Ubuntu crashes (wsl)

Ambiguous use of 'init' with CFStringTransform and Swift 3

Resetting Value of <input type="time"> in Firefox

Execute ./script.sh with a crontab

Converting a class method to a property with a backing field

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

How to update azerothcore-wotlk docker container

How to set tab order for array of cluster,where cluster elements have different data types in LabVIEW?

Grails with Oracle thick OCI driver authenticate to Oracle with wrong user

How to pass data to the ng2-bs3-modal?

Making Array From Page Elements in jQuery

Retrieve Element Tag Value XML Using Bash

Laravel's ORM sync with timestamps doesn't update timestamps

Do animations stop css changes after animation completion?