How do I select column based on value in another column with dplyr?

Igor Filippov

My data frame looks like this:

  id  A  T C  G ref var
1  1 10 15 7  0   A   C
2  2 11  9 2  3   A   G
3  3  2 31 1 12   T   C

I'd like to create two new columns: ref_count and var_count which will have following values:

  1. Value from A column and value from C column, since ref is A and var is C

  2. Value from A column and value from G column, since ref is A and var is G

etc.

So I'd like to select a column based on the value in another column for each row.

Thanks!

akrun

We can use pivot_longer to reshape into 'long' format, filter the rows and then reshape it to 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
df1 %>%
   pivot_longer(cols = A:G) %>%
   group_by(id) %>% 
   filter(name == ref|name == var) %>%
   mutate(nm1 = c('ref_count', 'var_count')) %>% 
   ungroup %>% 
   select(id, value, nm1) %>% 
   pivot_wider(names_from = nm1, values_from = value) %>%
   left_join(df1, .)
# A tibble: 3 x 9
#     id     A     T     C     G ref   var   ref_count var_count
#* <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>     <dbl>     <dbl>
#1     1    10    15     7     0 A     C            10         7
#2     2    11     9     2     3 A     G            11         3
#3     3     2    31     1    12 T     C            31         1

Or in base R, we can also make use of the vectorized row/column indexing

df1$refcount <- as.matrix(df1[2:5])[cbind(seq_len(nrow(df1)), match(df1$ref,  names(df1)[2:5]))]
df1$var_count <- as.matrix(df1[2:5])[cbind(seq_len(nrow(df1)), match(df1$var,  names(df1)[2:5]))]

data

df1 <- structure(list(id = 1:3, A = c(10, 11, 2), T = c(15, 9, 31), 
    C = c(7, 2, 1), G = c(0, 3, 12), ref = c("A", "A", "T"), 
    var = c("C", "G", "C")), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to select a value in a column based on another column?

How do i select a whole row based on the highest value of a column

How to select column values of a group based on another column value

How do I get the value from 1 column based on highest value of another column of the same row?

How do I check if pandas df column value exists based on value in another column?

How do I derive a value for a column based on another column's value?

How do I select the record with the minimum value for one column and where another column has a given value?

Select column based on another column value

How do I replace duplicate value in a column to make it unique based on another column in Pandas?

How do I find the most frequently occurring element in a column based on a value in another column in Excel?

How do I change the value in a dataframe's column based on another column?

Google Sheets: How do I execute a formula in a column based on the value of a dropdown in another column?

How do I make one column return a desired value based on the date in another column?

How do I get the most frequent words in a column of text based on the value of another column?

How do I filter out data based on a value in column while capturing minimum date criteria in another column?

How do I select a unique column value and display its sum of values of another column?

How do I select the counts of values in a column in pandas for a specific value in another column?

R dplyr - select values from one column based on position of a specific value in another column

Select columns based on column value range with dplyr

How do I populate a column based on whether it matches another column or not?

How do I sum a column based on another column?

How can I condense multiple SELECT statements into one when the value is based on the value in another column?

How do I add the value from one dataframe based on a column to another dataframe based on a row?

How do I create another column with value of different elements in a column?

How do I get the value of a column in a Pandas DataFrame with the name based on another columns value on the same row?

How do I return max value in a non-aggregated sql query based on another column value

Create column with dplyr based on value and also frequency of another column, in R

Create a new column based on the value arrangement of another column in R dplyr

How do I select rows in a dataframe based on one value having at least one True value in a different column?