How to create multiple columns in Pandas dataframe based on year

DroningVarlot

I have 10 years of hourly water level data that I'm trying to line up in separate columns based on year. The current format of the data is in two columns, one for the date and time (ex. 06/04/1989 06:00:00) of the reading and one for the water level. I'd like separate the data into individual columns based on year. I thought it was a straightforward task but with my limited experience in Pandas, I'm find it challenging. Any advice would be appreciated.

Input:

Obs_date         SLEV(metres)

31/12/1990 20:00    0.15
31/12/1990 21:00    0.14
31/12/1990 22:00    0.13
31/12/1990 23:00    0.16
...
31/12/1991 20:00    0.12
31/12/1991 21:00    0.13
31/12/1991 22:00    0.09
31/12/1991 23:00    0.08

Output:

Obs_date          1990   1991   
31-Dec 20:00:00   0.15   0.12
31-Dec 21:00:00   0.14   0.13
31-Dec 22:00:00   0.13   0.09
31-Dec 23:00:00   0.16   0.08
jezrael

First convert Obs_date to datetimes by to_datetime, then create new column by Series.dt.year and custom format by Series.dt.strftime and last pivoting by DataFrame.pivot with convert index to column by DataFrame.reset_index and DataFrame.rename_axis is used for remove column name:

df['Obs_date'] = pd.to_datetime(df['Obs_date'], format='%d/%m/%Y %H:%M')
df['year'] = df['Obs_date'].dt.year
df['Obs_date'] = df['Obs_date'].dt.strftime('%d-%b %H:%M:%S')

df = df.pivot('Obs_date', 'year','SLEV(metres)').reset_index().rename_axis(None, axis=1)
print (df)
          Obs_date  1990  1991
0  31-Dec 20:00:00  0.15  0.12
1  31-Dec 21:00:00  0.14  0.13
2  31-Dec 22:00:00  0.13  0.09
3  31-Dec 23:00:00  0.16  0.08

Or is possible create Series y and d and reshape by DataFrame.set_index with Series.unstack:

df['Obs_date'] = pd.to_datetime(df['Obs_date'], format='%d/%m/%Y %H:%M')
y = df['Obs_date'].dt.year
d = df['Obs_date'].dt.strftime('%d-%b %H:%M:%S')

df = df.set_index([d, y])['SLEV(metres)'].unstack().reset_index().rename_axis(None, axis=1)
print (df)
          Obs_date  1990  1991
0  31-Dec 20:00:00  0.15  0.12
1  31-Dec 21:00:00  0.14  0.13
2  31-Dec 22:00:00  0.13  0.09
3  31-Dec 23:00:00  0.16  0.08

If need processing data later and need correct order better is convert datetime to DatetimeIndex with some default year with 29.February, e.g 2020:

df['Obs_date'] = pd.to_datetime(df['Obs_date'], format='%d/%m/%Y %H:%M')
y = df['Obs_date'].dt.year
d = pd.to_datetime(df['Obs_date'].dt.strftime('2020-%m-%d %H:%M:%S'))

df = df.set_index([d, y])['SLEV(metres)'].unstack().rename_axis(None, axis=1)
print (df)
                     1990  1991
Obs_date                       
2020-12-31 20:00:00  0.15  0.12
2020-12-31 21:00:00  0.14  0.13
2020-12-31 22:00:00  0.13  0.09
2020-12-31 23:00:00  0.16  0.08

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

pandas get rolling data based on a past year and create columns in a dataframe

Create multiple boolean columns in pandas dataframe based on multiple conditions

Pandas dataframe - create multiple columns based on multiple conditions calculations

How to create multiple columns in Pandas Dataframe?

Python Pandas Dataframe - how to groupby year and summarize multiple columns in a table

How to create a pandas dataframe that contains ordered lists based on analysis conditions applied on multiple columns

How to create indicator columns in a pandas dataframe based on groups in another dataframe?

How to create sum of columns in Pandas based on a conditional of multiple columns?

How to create new column in DataFrame based on other columns in Python Pandas?

How to create new dataframe with pandas based on columns of other dataframes

How to get percentage count based on multiple columns in pandas dataframe?

How to select rows in Pandas dataframe based on string matching in multiple columns

How to reorder columns of pandas dataframe based on multiple conditions?

How to create multiple year columns in a new dataframe, from the original single column datetime dataframe?

Pandas - How to create a column with 3 outputs based on conditions on multiple columns

How to create multiple columns based on some conditions in pandas?

How to create Pandas DataFrame with multiple columns that have the same name/indentifier

Merge and create multiple columns based in the number of columns present in the dataframe- Pandas

How to create a Pandas DataFrame with columns?

Create a buffer in a dataframe based on multiple columns - Python

Overwriting Pandas dataframe is NA, based on multiple columns

rename multiple columns of pandas dataframe based on condition

Pandas: Sort a dataframe based on multiple columns

Melt multiple columns pandas dataframe based on criteria

pandas dataframe column based on row and multiple columns

Split dataframe based on multiple columns pandas groupby

Pandas: create category column based on multiple columns

Create a new column based on multiple columns in Pandas

Create multiple columns in pandas dataframe in single update