Find gaps in pandas time series dataframe sampled at 1 minute intervals and fill the gaps with new rows

Arran Duff

Problem

I have a data frame containing financial data sampled at 1 minute intervals. Occasionally a row or two of data might be missing.

  • I'm looking for a good (simple and efficient) way to insert new rows into the dataframe at the points in which there is missing data.
  • The new rows can be empty except for the index, which contains the timestamp.

For example:

 #Example Input---------------------------------------------
                      open     high     low      close
 2019-02-07 16:01:00  124.624  124.627  124.647  124.617  
 2019-02-07 16:04:00  124.646  124.655  124.664  124.645  

 # Desired Ouput--------------------------------------------
                      open     high     low      close
 2019-02-07 16:01:00  124.624  124.627  124.647  124.617  
 2019-02-07 16:02:00  NaN      NaN      NaN      NaN
 2019-02-07 16:03:00  NaN      NaN      NaN      NaN
 2019-02-07 16:04:00  124.646  124.655  124.664  124.645 

My current method is based off this post - Find missing minute data in time series data using pandas - which is advises only how to identify the gaps. Not how to fill them.

What I'm doing is creating a DateTimeIndex of 1min intervals. Then using this index, I create an entirely new dataframe, which can then be merged into my original dataframe thus filling the gaps. Code is shown below. It seems quite a round about way of doing this. I would like to know if there is a better way. Maybe with resampling the data?

import pandas as pd
from datetime import datetime

# Initialise prices dataframe with missing data
prices = pd.DataFrame([[datetime(2019,2,7,16,0),  124.634,  124.624, 124.65,   124.62],[datetime(2019,2,7,16,4), 124.624,  124.627,  124.647,  124.617]])
prices.columns = ['datetime','open','high','low','close']
prices = prices.set_index('datetime')
print(prices)

# Create a new dataframe with complete set of time intervals
idx_ref = pd.DatetimeIndex(start=datetime(2019,2,7,16,0), end=datetime(2019,2,7,16,4),freq='min')
df = pd.DataFrame(index=idx_ref)

# Merge the two dataframes 
prices = pd.merge(df, prices, how='outer', left_index=True, 
right_index=True)
print(prices)
jezrael

Use DataFrame.asfreq working with Datetimeindex:

prices = prices.set_index('datetime').asfreq('1Min')
print(prices)
                        open     high      low    close
datetime                                               
2019-02-07 16:00:00  124.634  124.624  124.650  124.620
2019-02-07 16:01:00      NaN      NaN      NaN      NaN
2019-02-07 16:02:00      NaN      NaN      NaN      NaN
2019-02-07 16:03:00      NaN      NaN      NaN      NaN
2019-02-07 16:04:00  124.624  124.627  124.647  124.617

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Fill Gaps in time series pandas dataframe

Add new rows in R dataframe to fill time gaps

Pandas: Fill gaps in a series with mean

Fill gaps from a parent in other rows in a DataFrame

Pandas: create rows to fill numeric gaps

Group rows based on gaps in time series

How to split pandas dataframe based on time gaps

Assign last value within 1 minute intervals to rows of a pandas DataFrame

Fill series with irregular gaps in between

Pandas groupby time gaps

Pandas Time series: find gaps in sessions, and name each session/gap with separate ID

fill time gaps with power query

fill the gaps in pandas data frame

Fill gaps in DataFrame MultiIndex level 1, differently for each level 0

How to find a number +1 to the largest in a list OR fill any numerical gaps

Order rows in a sequence and fill gaps for missing rows

Is there a function in pandas that look for "gaps" of some data in dataframe rows?

Gaps and Islands Python dataframe pandas

How do I find 5 minutes gaps in a Pandas dataframe?

Time Series fill NAs for big time gaps (more than a month) based on related data from other years

pandas fill missing time intervals as given in a dataframe

How to fill time interval gaps in SELECT query?

How to fill in gaps of duplicate indices in dataframe?

Using pandas to fill gaps only, and not NaNs on the ends

Fill date and id gaps, then lag in pandas

SQL Server filling gaps in time series

Filling gaps in time-series data

Filling gaps in time series Spark for different entities

Islands and gaps problem on time-series data