How to create a 5-minute interval datetime index using incomplete dataset (Python)


First time posting on here, so hopefully it will be fruitful :)

Basically, I have a CSV file which contains timeseries data. I'm using pandas.read_csv to create a dataframe from the CSV. It is organised in an annoying way however, with the first column used for dates only, and the second column used for intervals of 5 mins for each day (e.g. 0, 5, 10, 15, up to 1435). To further complicate the problem, some days have incomplete data, i.e. rows missing. So for a particular date, the time intervals may go something like 5, 10, 60, 505, etc. This means that I can't simply create a datetime index from scratch with 5 minute intervals.

I therefore have to have some way of using the data in column B and combining it with the date in column A to give me the timeseries I want. I have to attach this resulting column to the same dataframe and then use it as the index to allow me to produce hourly means of my data.

After much blood, sweat and tears, I came up with this so far, which generates a new list of times. The first part is correct as far as I can tell.

df = pd.read_csv(myfile)

newtime = []

for r in df['Time']:
    if r // 60 < 10:
        if r % 60 < 10:
            r = "0" + str(r // 60) + ":0" + str(r % 60) + ":00"
            r = "0" + str(r // 60) + ":" + str(r % 60) + ":00"
        if r % 60 < 10:
            r = str(r // 60) + ":0" + str(r % 60) + ":00"
            r = str(r // 60) + ":" + str(r % 60) + ":00"


datetimes = []

for r in range(len(df['Date'])+1):
    v = str(df['Date'][r]) + newtime[r]

print datetimes

However, when I try concatenating it with the dates (the last bit of code), I get a very random error (KeyError: 203591L). Weirdly, it works fine if I replace the v = with just print and remove the line with the append statement.

What I'm hoping is that someone can either help me develop the existing code into a solution (to the point where it's ready for df.resample('1H', how={columnX: np.mean}) ), or tell me how I can do the whole thing a different way.

Thanks very much in advance!


R. Max

Looks like you are looking for this:

In [17]: df
         Date  Time
0  2014-05-01    60
1  2014-05-02   505

In [18]: pd.to_datetime(df['Date']) + pd.to_timedelta(df['Time'], unit='m')
0   2014-05-01 01:00:00
1   2014-05-02 08:25:00
dtype: datetime64[ns]

