Overlapping join with dplyr when reference tibble contains NA / missing values

Nova

I'd like to join two dataframes: reference data, refdata, containing data describing when a person was wearing a collar; and movedata, containing data describing where a collar was at a given time. Collars can be moved from person to person, and if a collar is still on, there is no timeend defined for that collar.

Here's some example data and one of the joins I've tried:

library(dplyr)

refdata <- tibble(person = c("Jess", "Andy", "Cody"),
                  collar = c("a", "b", "a"),
                  timestart = c(1, 2, 5),
                  timeend = c(3, 4, NA))

movedata <- tibble(collar = c("a", "a", "a", "a", "a", "a", "a", 
                              "b", "b", "b", "b", "b", "b", "b", "c", "c", "c"), 
                   time = c(1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3), 
          location = c("skatepark", "home", "library", "street", "crib", "crib", "bassinet", "street", "work", "work", "home", "street", "street", "street", "street", "street", "street"))

by <- join_by(collar,
              between(time, timestart, timeend))
check <- full_join(movedata, refdata, by)

# View input data
refdata
# A tibble: 3 × 4
  person collar timestart timeend
  <chr>  <chr>      <dbl>   <dbl>
1 Jess   a              1       3
2 Andy   b              2       4
3 Cody   a              5      NA

movedata
# A tibble: 17 × 3
   collar  time location 
   <chr>  <dbl> <chr>    
 1 a          1 skatepark
 2 a          2 home     
 3 a          3 library  
 4 a          4 street   
 5 a          5 crib     
 6 a          6 crib     
 7 a          7 bassinet 
 8 b          1 street   
 9 b          2 work     
10 b          3 work     
11 b          4 home     
12 b          5 street   
13 b          6 street   
14 b          7 street   
15 c          1 street   
16 c          2 street   
17 c          3 street

# result of join (not desired output):
check 
# A tibble: 18 × 6
   collar  time location  person timestart timeend
   <chr>  <dbl> <chr>     <chr>      <dbl>   <dbl>
 1 a          1 skatepark Jess           1       3
 2 a          2 home      Jess           1       3
 3 a          3 library   Jess           1       3
 4 a          4 street    NA            NA      NA
 5 a          5 crib      NA            NA      NA
 6 a          6 crib      NA            NA      NA
 7 a          7 bassinet  NA            NA      NA
 8 b          1 street    NA            NA      NA
 9 b          2 work      Andy           2       4
10 b          3 work      Andy           2       4
11 b          4 home      Andy           2       4
12 b          5 street    NA            NA      NA
13 b          6 street    NA            NA      NA
14 b          7 street    NA            NA      NA
15 c          1 street    NA            NA      NA
16 c          2 street    NA            NA      NA
17 c          3 street    NA            NA      NA
18 a         NA NA        Cody           5      NA

but my desired result would show that Cody has worn, and is still wearing, the collar:

# Expected result
# A tibble: 17 × 6
   collar  time location  person timestart timeend
   <chr>  <dbl> <chr>     <chr>      <dbl>   <dbl>
 1 a          1 skatepark Jess           1       3
 2 a          2 home      Jess           1       3
 3 a          3 library   Jess           1       3
 4 a          4 street    NA             5      NA
 5 a          5 crib      Cody           5      NA
 6 a          6 crib      Cody           5      NA
 7 a          7 bassinet  Cody           5      NA
 8 b          1 street    NA            NA      NA
 9 b          2 work      Andy           2       4
10 b          3 work      Andy           2       4
11 b          4 home      Andy           2       4
12 b          5 street    NA            NA      NA
13 b          6 street    NA            NA      NA
14 b          7 street    NA            NA      NA
15 c          1 street    NA            NA      NA
16 c          2 street    NA            NA      NA
17 c          3 street    NA            NA      NA

I'd be interested in two solutions: one that creates the expected table above, and one that removes any rows that are not associated with a person (so rows 8 & 12-17, above).

Maël

One way would be to replace the NAs in timeend with Inf:

refdata$timeend <- replace(refdata$timeend, is.na(refdata$timeend), Inf)
by <- join_by(collar,
              between(time, timestart, timeend))
full_join(movedata, refdata, by)

# # A tibble: 17 × 6
#    collar  time location  person timestart timeend
#    <chr>  <dbl> <chr>     <chr>      <dbl>   <dbl>
#  1 a          1 skatepark Jess           1       3
#  2 a          2 home      Jess           1       3
#  3 a          3 library   Jess           1       3
#  4 a          4 street    NA            NA      NA
#  5 a          5 crib      Cody           5     Inf
#  6 a          6 crib      Cody           5     Inf
#  7 a          7 bassinet  Cody           5     Inf
#  8 b          1 street    NA            NA      NA
#  9 b          2 work      Andy           2       4
# 10 b          3 work      Andy           2       4
# 11 b          4 home      Andy           2       4
# 12 b          5 street    NA            NA      NA
# 13 b          6 street    NA            NA      NA
# 14 b          7 street    NA            NA      NA
# 15 c          1 street    NA            NA      NA
# 16 c          2 street    NA            NA      NA
# 17 c          3 street    NA            NA      NA

For your second expected output, check inner_join:

inner_join(movedata, refdata, by)

# # A tibble: 9 × 6
#   collar  time location  person timestart timeend
#   <chr>  <dbl> <chr>     <chr>      <dbl>   <dbl>
# 1 a          1 skatepark Jess           1       3
# 2 a          2 home      Jess           1       3
# 3 a          3 library   Jess           1       3
# 4 a          5 crib      Cody           5     Inf
# 5 a          6 crib      Cody           5     Inf
# 6 a          7 bassinet  Cody           5     Inf
# 7 b          2 work      Andy           2       4
# 8 b          3 work      Andy           2       4
# 9 b          4 home      Andy           2       4

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Set certain values to NA with dplyr

dplyr join define NA values

dplyr summary of values returns NA

Dplyr join: NA match to any

Power BI: Make a line chart continuous when source contains null values (handle missing values)

Using dplyr to fill in missing values (through a join?)

na not picking missing values?

R - Is it possible to unnest a list-column that contains missing (NA) values?

Replace NA values in a tibble with a list

How to remove missing values (NA) when uniting columns?

dplyr: summarize by multiple overlapping group structures and join

Converting the values in all columns to a certain value in tibble/dplyr

dplyr mutate() displaying NA values when matched from dataframe

Setting multiple values to NA with dplyr

Cross-reference multiple lists with master list to produce merged output with missing values as NA

Using rollmean when there are missing values (NA)

Join, filling in missing key values

Inputting NA where there are missing values when scraping with rvest

Sum() in dplyr and aggregate: NA values

Ignore NA values in filtering with dplyr

tibble full_join and conditionally replace NA

Join by overlapping periods while operating for some of the values

SQL join for filling missing values

Concatenate string field using dplyr when some values are NA

Replace missing values from a reference dataframe in a pyspark join

R dplyr full_join(x, y): fill NA with values from x

R dplyr full_join(x, y): fill NA with values from x

How to stop overwriting values with NA when referenced index missing?

How to calculate partial correlations when the data frame contains missing values

TOP Ranking

HotTag

Archive