Sorting multiple columns by first letter and by numbers in R

melbez

I have created a dataframe that looks like the following:

item  mean
a_b   5
a_c   2
a_a   4
b_d   7
b_f   3
b_e   1

I would like to sort it so that it is first sorted by whether or not it begins with "a_" or "b_", and then have it sorted by mean. The final dataframe should look like this:

item  mean
a_c   2
a_a   4
a_b   5
b_e   1
b_f   3
b_d   7

Note that the item column is not sorted perfectly alphabetically. It is only sorted by the first letter.

I have tried:

arrange(df, item, mean) 

The problem with this is that it does not only sort by the "a_" and "b_" categories, but by the entire item name.

I am open to separating the original dataframe into separate dataframes using filter and then sorting the mean within these smaller subsets. I do not need everything to stay in the same dataframe. However, I am unsure how to use filter to only select rows that have items beginning with "a_" or "b_".

avid_useR

Another method using dplyr:

library(dplyr)
arrange(df, sub('_.+$', '', item), mean)

an alternative would be to use str_extract from stringr to extract only the first letter from item:

library(stringr)
arrange(df, str_extract(item, '^._'), mean)

Result:

  item mean
1  a_c    2
2  a_a    4
3  a_b    5
4  b_e    1
5  b_f    3
6  b_d    7

Data:

df <- structure(list(item = c("a_b", "a_c", "a_a", "b_d", "b_f", "b_e"
), mean = c(5L, 2L, 4L, 7L, 3L, 1L)), .Names = c("item", "mean"
), class = "data.frame", row.names = c(NA, -6L))

Notes:

  • sub('_.+$', '', item) creates a temporary variable by removing _ and everything after that from item. _.+$ matches a literal underscore (_) followed by any character one or more times (.+) at the end of the string ($).

  • str_extract(item, '^._') creates a temporary variable by extracting any one character (.) followed by a literal underscore (_) in the beginning of the string (^)

  • The neat thing about dplyr::arrange is that you can create a temporary sorting variable within the function and not have it included in the output.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related