I have a big dataframe (more than 900000 rows) and want to add some columns depending on the first column (Timestamp with date and time). My code works, but I guess it's far too complicated and slow. I'm a beginner so help would be appreciated! Thanks!
df['seconds_midnight'] = 0
df['weekday'] = 0
df['month'] = 0
def date_to_new_columns(date_var, i):
sec_after_midnight = dt.timedelta(hours=date_var.hour, minutes=date_var.minute, seconds=date_var.second).total_seconds()
weekday = dt.date.isoweekday(date_var)
month1 = date_var.month
df.iloc[i, 24] = sec_after_midnight
df.iloc[i, 25] = weekday
df.iloc[i, 26] = month1
return
for i in range(0, 903308):
date_to_new_columns(df.timestamp.iloc[i], i)
So the reason this is slow is the for loop processing each row individually. One thing that makes pandas so nice is that you can quickly process whole columns/dataframes in one operation.
So create all the rows for each new column at the same time:
def date_to_new_columns(df):
df['sec_after_midnight'] = (df.timestamp - df.timestamp.dt.normalize()).dt.seconds
df['weekday'] = df.timestamp.dt.day_name
df['month1'] = df.timestamp.dt.month
return
Note that the dt.day_name method is called dt.weekday_name prior to pandas version 0.23.0.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments