I want to make a new column loop as below: The first column is the index of the household. the second column is the index of person in that household . the third column is the index of trip of each day in each family during a day. the zoneOfHome is the zone of the home of that household. start_zone is a zone that a person start his/her trip from there and end_zone is the zone of a place that a person travel to there. last column is indicator when a person back home. A loop is a sequence of trips which starts from home and ends at home. I want a new column 'loop' which determine loop of each trip of household member.
tibble::tribble(
~Household, ~person, ~trip, ~ZoneOfHome, ~start_zone, ~end_zone, ~purpose,
1L, 1L, 1L, 22L, 22L, 13L, 0,
1L, 1L, 2L, 22L, 13L, 22L, 1,
1L, 1L, 3L, 22L, 22L, 34L, 0,
1L, 1L, 4L, 22L, 34L, 22L, 1,
1L, 2L, 1L, 22L, 22L, 13L, 0,
1L, 2L, 2L, 22L, 13L, 22L, 1,
2L, 1L, 1L, 15L, 15L, 15L, 0,
2L, 1L, 2L, 15L, 15L, 15L, 1,
2L, 1L, 3L, 15L, 15L, 45L, 0,
2L, 1L, 4L, 15L, 45L, 15L, 1,
3L, 1L, 1L, 17L, 6L, 17L, 1,
3L, 1L, 2L, 17L, 17L, 10L, 0,
3L, 1L, 3L, 17L, 10L, 17L, 1
)
For each person a loop is start when start_zone=zone
until indicator is 1.
Household person trip ZoneOfHome start_zone end_zone loop
1 1 1 22 22 13 1
1 1 2 22 13 22 1
1 1 3 22 22 34 2
1 1 4 22 34 22 2
1 2 1 22 22 13 1
1 2 2 22 13 22 1
2 1 1 15 15 15 1
2 1 2 15 15 15 1
2 1 3 15 15 45 2
2 1 4 15 45 15 2
3 1 1 17 6 17 -
3 1 2 17 17 10 1
3 1 3 17 10 17 1
The additional purpose
column adds information which makes it much easier to identify loops than in the original question. If I understand correctly, purpose
is 1 if the destination of a trip is the home of the person.
library(data.table)
setDT(DT)[, loop := cumsum(start_zone == ZoneOfHome & purpose != 1), by = .(Household, person)][]
Household person trip ZoneOfHome start_zone end_zone purpose loop 1: 1 1 1 22 22 13 0 1 2: 1 1 2 22 13 22 1 1 3: 1 1 3 22 22 34 0 2 4: 1 1 4 22 34 22 1 2 5: 1 2 1 22 22 13 0 1 6: 1 2 2 22 13 22 1 1 7: 2 1 1 15 15 15 0 1 8: 2 1 2 15 15 15 1 1 9: 2 1 3 15 15 45 0 2 10: 2 1 4 15 45 15 1 2 11: 3 1 1 17 6 17 1 0 12: 3 1 2 17 17 10 0 1 13: 3 1 3 17 10 17 1 1
The OP has defined
A loop is a sequence of trips which starts from home and ends at home.
So, we need to identify which trips start at home. This information is not explicitely given but can be derived from the condition start_zone == ZoneOfHome & purpose != 1
, i.e., the trip starts in the home zone but is not heading home. The expression cumsum(start_zone == ZoneOfHome & purpose != 1)
is advanced by one whenever a new loop starts (for each person in a household).
Note that we do not explicitely test for the end of trip, i.e., purpose == 1
. It is assumed that a trip which starts in the home zone and is not heading home starts at home and that the previous trip has ended at home. This might not be true for more complex trip patterns.
So, it might be safer to include a test that the previous trip of a person has ended at home indeed:
setDT(DT)[, loop := cumsum(start_zone == ZoneOfHome & shift(purpose, fill = 1) == 1),
by = .(Household, person)][]
The result is the same as above for the given sample dataset.
So, a new loop starts from home if the starting point is in the home zone and the previous trip has ended at home.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments