Recode a numeric variable with many values including NAs

Giora Simchoni

How to recode a numeric variable with many values including missing values, to have numbers 0:n-1 where n is the number of unique values including NA, in a tidy way?

Example:

df <- tibble(x = c(1000, 1000, NA, 1001, 1002, 1003, NA, 1003))

Required output (though it can be any recoding scheme as long as the values are 0:n-1):

# A tibble: 8 x 2
      x     y
  <dbl> <dbl>
1  1000     0
2  1000     0
3    NA     4
4  1001     1
5  1002     2
6  1003     3
7    NA     4
8  1003     3

I am able to do this in a tidy way by first converting x to a factor then using fct_recode() with a named list (created automatically because remember there are many values in x), then back to numeric:

df <- df %>% mutate(x_fct = factor(case_when(
    is.na(x) ~ "level_na",
    TRUE ~ str_c("level_", x)
  )))
x_levels <- levels(df$x_fct)
n_levels <- length(x_levels)
names(x_levels) <- as.character(0:(n_levels - 1))
df <- df %>%
  mutate(y = as.numeric(fct_recode(x_fct, !!!x_levels)) - 1)
df
# A tibble: 8 x 3
      x x_fct          y
  <dbl> <fct>      <dbl>
1  1000 level_1000     0
2  1000 level_1000     0
3    NA level_na       4
4  1001 level_1001     1
5  1002 level_1002     2
6  1003 level_1003     3
7    NA level_na       4
8  1003 level_1003     3

But this seems very cumbersome. Surely there's a simpler way, preferably in a single pipe.

Ronak Shah

One way would be to use match + unique. You can add sample to add randomness

library(dplyr)

df %>%
  mutate(level = paste('level', x, sep = '_'), 
         y = match(x, sample(unique(x))) - 1)

#      x level          y
#  <dbl> <chr>      <dbl>
#1  1000 level_1000     4
#2  1000 level_1000     4
#3    NA level_NA       2
#4  1001 level_1001     0
#5  1002 level_1002     1
#6  1003 level_1003     3
#7    NA level_NA       2
#8  1003 level_1003     3

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Recode numeric values in R

using dplyr's recode to recode only some numeric values in a column

Recode values in HIVE based on numeric condition

Recode variable values into strings based on different variable

How to recode values to na in entire dataframe for only numeric values?

Recode a numeric variable based on a range of categorical variables in R

Subtract rows with numeric values and ignore NAs

Recode Numeric To Character

Mutate changed, how to recode into new variable with excluded values defined as NA

Recode into new variable conditional on values in two other variables

Aggregate numeric values from jsonb array of records (including NULL values)

difference between a numeric variable and many others

R: Add column to data frame with values in an indexed vector, including NAs

aggregate toString ignoring NA values / Concatenate rows including NAs

How to recode shiny ui.r numeric values in server.r for main panel outputs?

Recode column of character variables in one data frame with numeric values in another data frame

Dplyr: Recode numeric and character vectors

recode values into one column

Recode values in R

Recode values within group

Recode multiple missing values

Recode A String Variable

How to recode ordinal variable?

Histogram of factorial variable including missing values as a level

Seaborn: barchart of a categorical variable including missing values

recode values in a character variable based on another character variable's value in sas

R: How to recode values of a variable to NA for cases where another variable has a value of NA

Apply recode pattern to many columns

R DataTable Recode Many Variables