Count unique occurrences of factor levels and numeric values with dplyr, on data in a long format

tcvdb1992 Published at Dev

tcvdb1992

I have data on repeated measurements of 8 patients, each with varying amount of repeated measurements on the same variables. The measured variables are sex, blood pressure (sys_bp), and how many CT scans a person underwent:

library(dplyr)
library(magrittr)

questiondata <- structure(list(id = c(2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 
4, 7, 7, 8, 8, 8, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 20, 
20, 20), time = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 
1L, 2L, 3L, 4L, 5L, 1L, 6L, 1L, 2L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 
2L, 3L, 4L, 5L, 1L, 2L, 4L), .Label = c("T0", "T1M0", "T1M6", 
"T1M12", "T2M0", "FU1"), class = "factor"), sys_bp = c(116, 125.8, 
NA, NA, NA, 113.2, NA, NA, NA, NA, 146, NA, NA, NA, NA, NA, NA, 
125, NA, NA, 164.5, NA, NA, NA, NA, 150.5, NA, NA, NA, NA, 158, 
NA), sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("female", "male"), class = "factor"), 
    ct_amount = c(4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
    5L, 5L, 5L, 2L, 2L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
    5L, 5L, 5L, 3L, 3L, 3L)), row.names = c(NA, -32L), class = c("tbl_df", 
"tbl", "data.frame"))

questiondata

      id time  sys_bp sex    ct_amount
   <dbl> <fct>  <dbl> <fct>      <int>
 1     2 T0      116  female         4
 2     2 T1M0    126. female         4
 3     2 T1M6     NA  female         4
 4     2 T1M12    NA  female         4
 5     3 T0       NA  female         5
 6     3 T1M0    113. female         5
 7     3 T1M6     NA  female         5
 8     3 T1M12    NA  female         5
 9     3 T2M0     NA  female         5
10     4 T0       NA  male           5
11     4 T1M0    146  male           5
12     4 T1M6     NA  male           5
13     4 T1M12    NA  male           5
14     4 T2M0     NA  male           5
15     7 T0       NA  female         2
16     7 FU1      NA  female         2
17     8 T0       NA  female         3
18     8 T1M0    125  female         3
19     8 T2M0     NA  female         3
20    13 T0       NA  female         5
21    13 T1M0    164. female         5
22    13 T1M6     NA  female         5
23    13 T1M12    NA  female         5
24    13 T2M0     NA  female         5
25    14 T0       NA  male           5
26    14 T1M0    150. male           5
27    14 T1M6     NA  male           5
28    14 T1M12    NA  male           5
29    14 T2M0     NA  male           5
30    20 T0       NA  female         3
31    20 T1M0    158  female         3
32    20 T1M12    NA  female         3

I am trying to count the number of persons that (1) is male/female (2) has 1/2/3/4/5 CT scans.

So the output would be that there are (1) 6 females and 2 males, and (2) 1 person with 2 CTs, 2 persons with 3 CTs, 1 person with 4 CTs and 4 persons with 5 CTs.

I've tried many combinations of group_by and summarise and count, but can't seem to get it right. Any help?

Ronak Shah

You can first keep only the unique rows for each id. Then use count to get the output.

library(dplyr)

unique_data <- questiondata %>% distinct(id, .keep_all = TRUE)

unique_data %>% count(sex)
# A tibble: 2 x 2
#  sex        n
#  <fct>  <int>
#1 female     6
#2 male       2

unique_data %>% count(ct_amount)

# A tibble: 4 x 2
#  ct_amount     n
#      <int> <int>
#1         2     1
#2         3     2
#3         4     1
#4         5     4

Collected from the Internet

Please contact [email protected] to delete if infringement.