Sequential count of values within factor level, ignoring NAs

ghaines

I have mark-recapture data in long form, and I want a column counting the number of times each individual has been seen at the time of each observation.

here is an example of the sort of data that I have:

dat<-tibble(ID=c("A","A","A","A","A","B","B","B","B","B"),
           period=c("Aug.2012","Jun.2013","Aug.2013","Jun.2014","Aug.2014",
                    "Aug.2012","Jun.2013","Aug.2013","Jun.2014","Aug.2014"),
           length=c(12,NA,NA,15,19, NA,3,6,10,NA))
dat$sample.event<-rep(1:5,dim(dat)[1]/5)

# A tibble: 10 × 4
   ID    period   length sample.event
   <chr> <chr>     <dbl>        <int>
 1 A     Aug.2012     12            1
 2 A     Jun.2013     NA            2
 3 A     Aug.2013     NA            3
 4 A     Jun.2014     15            4
 5 A     Aug.2014     19            5
 6 B     Aug.2012     NA            1
 7 B     Jun.2013      3            2
 8 B     Aug.2013      6            3
 9 B     Jun.2014     10            4
10 B     Aug.2014     NA            5

so I want a new column called ind.obs counting each time an individual has been seen, like this, but I want to keep the rows of the dataframe where the individual was not seen:

   ID   period length sample.event ind.obs
1   A Aug.2012     12            1       1
2   A Jun.2013     NA            2      NA
3   A Aug.2013     NA            3      NA
4   A Jun.2014     15            4       2
5   A Aug.2014     19            5       3
6   B Aug.2012     NA            1      NA
7   B Jun.2013      3            2       1
8   B Aug.2013      6            3       2
9   B Jun.2014     10            4       3
10  B Aug.2014     NA            5      NA

This seems like it should be possible using dplyr, but I can't figure it out.

I have tried:

dat%>%group_by(ID)%>%
  drop_na(length) %>%
  mutate(ind.obs=sequence(n()))

  ID    period   length sample.event ind.obs
  <chr> <chr>     <dbl>        <int>   <int>
1 A     Aug.2012     12            1       1
2 A     Jun.2014     15            4       2
3 A     Aug.2014     19            5       3
4 B     Jun.2013      3            2       1
5 B     Aug.2013      6            3       2
6 B     Jun.2014     10            4       3

But as you can see, this completely removes the rows without observations.

I've also tried this, but get an error:

dat%>%group_by(ID) %>%mutate(ind.obs=sequence(n(na.rm=T)))

Error in `mutate()`:
ℹ In argument: `ind.obs = sequence(n(na.rm = T))`.
ℹ In group 1: `ID = "A"`.
Caused by error in `n()`:
! unused argument (na.rm = T)

Would appreciate any tips for resolving this, thanks

Jon Spring

I bet there's something sharper, but:

dat %>%   
  mutate(ind.obs = if_else(is.na(length), length, 
                           cumsum(!is.na(length))), .by=ID)
# Thanks @TarJae for the helpful improvement

Result

# A tibble: 10 × 5
   ID    period   length sample.event ind.obs
   <chr> <chr>     <dbl>        <int>   <dbl>
 1 A     Aug.2012     12            1       1
 2 A     Jun.2013     NA            2      NA
 3 A     Aug.2013     NA            3      NA
 4 A     Jun.2014     15            4       2
 5 A     Aug.2014     19            5       3
 6 B     Aug.2012     NA            1      NA
 7 B     Jun.2013      3            2       1
 8 B     Aug.2013      6            3       2
 9 B     Jun.2014     10            4       3
10 B     Aug.2014     NA            5      NA

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to count how many values per level in a given factor?

R replace duplicate values in a factor level

r - using summarise_each() to count records ignoring NAs

Getting count but ignoring duplicate column values in pdo

Collapsing factor level for all the factor variable in dataframe based on the count

R: t-test between rows within each factor level

Calculate cummean() and cumsd() while ignoring NA values and filling NAs

Replacing NAs witha new factor level in one column based on factor level in another column using data.table

How can i manipulate data within a factor level by a subset of another factor in that level in a dataframe without loops

Ignoring a property within Cypher Query OR alternative: how count relationship sequences

Count values in column ignoring duplicates

Assigning a predetermined level for NA values within a factor

Ignoring NULL values within an aggregate operation in MongoDB

How to count unique values in R while ignoring NAs

R performing linear regression within each level of a factor

Finding the count of the same matrix and ignoring NAs present in the matrix

Count number of values within cell

R reassign / assign values to a level of a factor?

Summarizing count data and returning highest level of a factor in R

Graph proportion within a factor level rather than a count in ggplot2

Add count of one factor level within group_by

how to take the count from sequential values

Can you add a count to the legend for each level of a factor in ggplot?

Conditional pasting of values from different columns in R (ignoring NAs)

Count distinct values within a column

Count cumulative and sequential values of the same sign in R

aggregate toString ignoring NA values / Concatenate rows including NAs

Collapse levels of a factor when number of observations within a level are below a limit

How to count the number of successive occurrences within groups while ignoring NA?