Sequential count of values within factor level, ignoring NAs

ghaines Published at Dev

ghaines

I have mark-recapture data in long form, and I want a column counting the number of times each individual has been seen at the time of each observation.

here is an example of the sort of data that I have:

dat<-tibble(ID=c("A","A","A","A","A","B","B","B","B","B"),
           period=c("Aug.2012","Jun.2013","Aug.2013","Jun.2014","Aug.2014",
                    "Aug.2012","Jun.2013","Aug.2013","Jun.2014","Aug.2014"),
           length=c(12,NA,NA,15,19, NA,3,6,10,NA))
dat$sample.event<-rep(1:5,dim(dat)[1]/5)

# A tibble: 10 × 4
   ID    period   length sample.event
   <chr> <chr>     <dbl>        <int>
 1 A     Aug.2012     12            1
 2 A     Jun.2013     NA            2
 3 A     Aug.2013     NA            3
 4 A     Jun.2014     15            4
 5 A     Aug.2014     19            5
 6 B     Aug.2012     NA            1
 7 B     Jun.2013      3            2
 8 B     Aug.2013      6            3
 9 B     Jun.2014     10            4
10 B     Aug.2014     NA            5

so I want a new column called ind.obs counting each time an individual has been seen, like this, but I want to keep the rows of the dataframe where the individual was not seen:

   ID   period length sample.event ind.obs
1   A Aug.2012     12            1       1
2   A Jun.2013     NA            2      NA
3   A Aug.2013     NA            3      NA
4   A Jun.2014     15            4       2
5   A Aug.2014     19            5       3
6   B Aug.2012     NA            1      NA
7   B Jun.2013      3            2       1
8   B Aug.2013      6            3       2
9   B Jun.2014     10            4       3
10  B Aug.2014     NA            5      NA

This seems like it should be possible using dplyr, but I can't figure it out.

I have tried:

dat%>%group_by(ID)%>%
  drop_na(length) %>%
  mutate(ind.obs=sequence(n()))

  ID    period   length sample.event ind.obs
  <chr> <chr>     <dbl>        <int>   <int>
1 A     Aug.2012     12            1       1
2 A     Jun.2014     15            4       2
3 A     Aug.2014     19            5       3
4 B     Jun.2013      3            2       1
5 B     Aug.2013      6            3       2
6 B     Jun.2014     10            4       3

But as you can see, this completely removes the rows without observations.

I've also tried this, but get an error:

dat%>%group_by(ID) %>%mutate(ind.obs=sequence(n(na.rm=T)))

Error in `mutate()`:
ℹ In argument: `ind.obs = sequence(n(na.rm = T))`.
ℹ In group 1: `ID = "A"`.
Caused by error in `n()`:
! unused argument (na.rm = T)

Would appreciate any tips for resolving this, thanks

Jon Spring

I bet there's something sharper, but:

dat %>%   
  mutate(ind.obs = if_else(is.na(length), length, 
                           cumsum(!is.na(length))), .by=ID)
# Thanks @TarJae for the helpful improvement

Result

# A tibble: 10 × 5
   ID    period   length sample.event ind.obs
   <chr> <chr>     <dbl>        <int>   <dbl>
 1 A     Aug.2012     12            1       1
 2 A     Jun.2013     NA            2      NA
 3 A     Aug.2013     NA            3      NA
 4 A     Jun.2014     15            4       2
 5 A     Aug.2014     19            5       3
 6 B     Aug.2012     NA            1      NA
 7 B     Jun.2013      3            2       1
 8 B     Aug.2013      6            3       2
 9 B     Jun.2014     10            4       3
10 B     Aug.2014     NA            5      NA

Collected from the Internet

Please contact [email protected] to delete if infringement.