I have 10 years of hourly water level data that I'm trying to line up in separate columns based on year. The current format of the data is in two columns, one for the date and time (ex. 06/04/1989 06:00:00) of the reading and one for the water level. I'd like separate the data into individual columns based on year. I thought it was a straightforward task but with my limited experience in Pandas, I'm find it challenging. Any advice would be appreciated.
Input:
Obs_date SLEV(metres)
31/12/1990 20:00 0.15
31/12/1990 21:00 0.14
31/12/1990 22:00 0.13
31/12/1990 23:00 0.16
...
31/12/1991 20:00 0.12
31/12/1991 21:00 0.13
31/12/1991 22:00 0.09
31/12/1991 23:00 0.08
Output:
Obs_date 1990 1991
31-Dec 20:00:00 0.15 0.12
31-Dec 21:00:00 0.14 0.13
31-Dec 22:00:00 0.13 0.09
31-Dec 23:00:00 0.16 0.08
First convert Obs_date
to datetimes by to_datetime
, then create new column by Series.dt.year
and custom format by Series.dt.strftime
and last pivoting by DataFrame.pivot
with convert index to column by DataFrame.reset_index
and DataFrame.rename_axis
is used for remove column name:
df['Obs_date'] = pd.to_datetime(df['Obs_date'], format='%d/%m/%Y %H:%M')
df['year'] = df['Obs_date'].dt.year
df['Obs_date'] = df['Obs_date'].dt.strftime('%d-%b %H:%M:%S')
df = df.pivot('Obs_date', 'year','SLEV(metres)').reset_index().rename_axis(None, axis=1)
print (df)
Obs_date 1990 1991
0 31-Dec 20:00:00 0.15 0.12
1 31-Dec 21:00:00 0.14 0.13
2 31-Dec 22:00:00 0.13 0.09
3 31-Dec 23:00:00 0.16 0.08
Or is possible create Series y
and d
and reshape by DataFrame.set_index
with Series.unstack
:
df['Obs_date'] = pd.to_datetime(df['Obs_date'], format='%d/%m/%Y %H:%M')
y = df['Obs_date'].dt.year
d = df['Obs_date'].dt.strftime('%d-%b %H:%M:%S')
df = df.set_index([d, y])['SLEV(metres)'].unstack().reset_index().rename_axis(None, axis=1)
print (df)
Obs_date 1990 1991
0 31-Dec 20:00:00 0.15 0.12
1 31-Dec 21:00:00 0.14 0.13
2 31-Dec 22:00:00 0.13 0.09
3 31-Dec 23:00:00 0.16 0.08
If need processing data later and need correct order better is convert datetime to DatetimeIndex
with some default year with 29.February
, e.g 2020
:
df['Obs_date'] = pd.to_datetime(df['Obs_date'], format='%d/%m/%Y %H:%M')
y = df['Obs_date'].dt.year
d = pd.to_datetime(df['Obs_date'].dt.strftime('2020-%m-%d %H:%M:%S'))
df = df.set_index([d, y])['SLEV(metres)'].unstack().rename_axis(None, axis=1)
print (df)
1990 1991
Obs_date
2020-12-31 20:00:00 0.15 0.12
2020-12-31 21:00:00 0.14 0.13
2020-12-31 22:00:00 0.13 0.09
2020-12-31 23:00:00 0.16 0.08
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments