I have some values in a pandas df that are positive and negative ints, and I want to convert them to timedeltas so I can put them into a DurationField in a Django model.
date dep_time dep_delay arr_time arr_delay cancelled carrier \
103992 2014-05-11 10:13:00 -2 12:47:00 -13 0 B6
103993 2014-05-11 19:29:00 -1 22:15:00 -24 0 B6
103994 2014-05-11 11:17:00 5 13:55:00 9 0 B6
103995 2014-05-11 07:36:00 -10 09:24:00 -18 0 B6
103996 2014-05-11 13:40:00 0 16:47:00 10 0 B6
tailnum flight origin dest air_time distance duration
103992 N630JB 925 JFK TPA 137 1005 1013
103993 N632JB 225 JFK TPA 137 1005 1929
103994 N635JB 127 EWR MCO 126 937 1117
103995 N637JB 1273 JFK CHS 92 636 0736
103996 N637JB 213 JFK LGB 352 2465 1340
With this data, I want to express dep_delay, arr_delay, air_time and duration as timedeltas, but I keep getting zeroed-out values? I'm using
data['air_time'] = pd.to_timedelta(data['air_time'], errors='coerce')
If you are getting all 00:00:00.000000
values, then your air_time
values might be strings. (You can check the data type of the air_time
column by inspecting data.info()
. If the dtype says object
then the values are Python objects (such as str
s) instead of a NumPy integer data type. You can then confirm they are strings by inspecting set(map(type, data['air_time']))
.)
If they are strings, you can convert them to ints first by using:
data['air_time'] = data['air_time'].astype(int)
If 137 means 137 minutes then use
data['air_time'] = pd.to_timedelta(data['air_time'], unit='m', errors='coerce')
If, on the other hand, 137 means 1 hour and 37 minutes, then use
data['air_time'] = pd.to_timedelta(
(data['air_time']//100)*60 + (data['air_time'] % 100), unit='m',
errors='coerce')
The unit='m'
argument tells pd.to_timedelta
to interpret the values as minutes.
For example,
import pandas as pd
data = pd.DataFrame({'air_time':['137','137','126','92','352']})
data['air_time'] = data['air_time'].astype(int)
data['air_time'] = pd.to_timedelta(data['air_time'], unit='m', errors='coerce')
yields
air_time
0 02:17:00
1 02:17:00
2 02:06:00
3 01:32:00
4 05:52:00
Note that pd.to_timedelta
can also accepts strings as input if the strings contain the desired units. For example,
import pandas as pd
data = pd.DataFrame({'air_time':['137','137','126','92','352']})
data['air_time'] = data['air_time'] + ' minutes'
# air_time
# 0 137 minutes
# 1 137 minutes
# 2 126 minutes
# 3 92 minutes
# 4 352 minutes
data['air_time'] = pd.to_timedelta(data['air_time'], errors='coerce')
yields the same result.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments