Count unique occurrences of factor levels and numeric values with dplyr, on data in a long format

tcvdb1992

I have data on repeated measurements of 8 patients, each with varying amount of repeated measurements on the same variables. The measured variables are sex, blood pressure (sys_bp), and how many CT scans a person underwent:

library(dplyr)
library(magrittr)

questiondata <- structure(list(id = c(2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 
4, 7, 7, 8, 8, 8, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 20, 
20, 20), time = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 
1L, 2L, 3L, 4L, 5L, 1L, 6L, 1L, 2L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 
2L, 3L, 4L, 5L, 1L, 2L, 4L), .Label = c("T0", "T1M0", "T1M6", 
"T1M12", "T2M0", "FU1"), class = "factor"), sys_bp = c(116, 125.8, 
NA, NA, NA, 113.2, NA, NA, NA, NA, 146, NA, NA, NA, NA, NA, NA, 
125, NA, NA, 164.5, NA, NA, NA, NA, 150.5, NA, NA, NA, NA, 158, 
NA), sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("female", "male"), class = "factor"), 
    ct_amount = c(4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
    5L, 5L, 5L, 2L, 2L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
    5L, 5L, 5L, 3L, 3L, 3L)), row.names = c(NA, -32L), class = c("tbl_df", 
"tbl", "data.frame"))

questiondata

      id time  sys_bp sex    ct_amount
   <dbl> <fct>  <dbl> <fct>      <int>
 1     2 T0      116  female         4
 2     2 T1M0    126. female         4
 3     2 T1M6     NA  female         4
 4     2 T1M12    NA  female         4
 5     3 T0       NA  female         5
 6     3 T1M0    113. female         5
 7     3 T1M6     NA  female         5
 8     3 T1M12    NA  female         5
 9     3 T2M0     NA  female         5
10     4 T0       NA  male           5
11     4 T1M0    146  male           5
12     4 T1M6     NA  male           5
13     4 T1M12    NA  male           5
14     4 T2M0     NA  male           5
15     7 T0       NA  female         2
16     7 FU1      NA  female         2
17     8 T0       NA  female         3
18     8 T1M0    125  female         3
19     8 T2M0     NA  female         3
20    13 T0       NA  female         5
21    13 T1M0    164. female         5
22    13 T1M6     NA  female         5
23    13 T1M12    NA  female         5
24    13 T2M0     NA  female         5
25    14 T0       NA  male           5
26    14 T1M0    150. male           5
27    14 T1M6     NA  male           5
28    14 T1M12    NA  male           5
29    14 T2M0     NA  male           5
30    20 T0       NA  female         3
31    20 T1M0    158  female         3
32    20 T1M12    NA  female         3

I am trying to count the number of persons that (1) is male/female (2) has 1/2/3/4/5 CT scans.

So the output would be that there are (1) 6 females and 2 males, and (2) 1 person with 2 CTs, 2 persons with 3 CTs, 1 person with 4 CTs and 4 persons with 5 CTs.

I've tried many combinations of group_by and summarise and count, but can't seem to get it right. Any help?

Ronak Shah

You can first keep only the unique rows for each id. Then use count to get the output.

library(dplyr)

unique_data <- questiondata %>% distinct(id, .keep_all = TRUE)

unique_data %>% count(sex)
# A tibble: 2 x 2
#  sex        n
#  <fct>  <int>
#1 female     6
#2 male       2

unique_data %>% count(ct_amount)

# A tibble: 4 x 2
#  ct_amount     n
#      <int> <int>
#1         2     1
#2         3     2
#3         4     1
#4         5     4

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Change numeric values in one column based on factor levels in another column

Assining values to numeric factor levels

R group by show count of all factor levels even when zero dplyr

Subset data frame to include only levels of one factor that have values in both levels of another factor

Pandas - Count and get unique occurrences of string values from a column

How to use dplyr to convert variables from numeric to factor with unique levels

How to count occurrences of sequences of values in a data frame?

How to bin factor(character values-aphla_numeric) variables with many levels in r

Confused on factor levels and mutating with dplyr

Transforming a Numeric variable into a Factor in dplyr

ordering data by factor levels

Turning a data frame and a list into long format with dplyr

Count the number of unique values with at least k occurrences per group in postgres

Automatically reorder factor levels in dplyr

dplyr count unique values in two columns without reshaping long

R - Switch levels and values in a factor

Count and Display Occurrences of Unique values in a row

R: Make unique the duplicated levels in all factor columns in a data frame

Changing the levels of factor to numeric values:

Modify a variable in a data frame, only for some levels of a factor (possibly with dplyr)

Renames levels of factor conditional on unique levels of another factor

as.numeric changes the actual values as data which is originally a factor.

Coding a new factor based on levels of a numeric variable

Count unique occurrences within data frame

Pandas groupby Id and count occurrences of picklist/unique values

change a numeric column to a factor and assign labels/levels to the data

Convert factor to numeric in the same order of the factor from 0 to length of the unique values

Sorting the values inside row in a data frame, by the order of its factor levels?

Showing unique values/levels for each character/factor variable at once?