I have created a dataframe that looks like the following:
item mean
a_b 5
a_c 2
a_a 4
b_d 7
b_f 3
b_e 1
I would like to sort it so that it is first sorted by whether or not it begins with "a_" or "b_", and then have it sorted by mean. The final dataframe should look like this:
item mean
a_c 2
a_a 4
a_b 5
b_e 1
b_f 3
b_d 7
Note that the item column is not sorted perfectly alphabetically. It is only sorted by the first letter.
I have tried:
arrange(df, item, mean)
The problem with this is that it does not only sort by the "a_" and "b_" categories, but by the entire item name.
I am open to separating the original dataframe into separate dataframes using filter and then sorting the mean within these smaller subsets. I do not need everything to stay in the same dataframe. However, I am unsure how to use filter to only select rows that have items beginning with "a_" or "b_".
Another method using dplyr
:
library(dplyr)
arrange(df, sub('_.+$', '', item), mean)
an alternative would be to use str_extract
from stringr
to extract only the first letter from item
:
library(stringr)
arrange(df, str_extract(item, '^._'), mean)
Result:
item mean
1 a_c 2
2 a_a 4
3 a_b 5
4 b_e 1
5 b_f 3
6 b_d 7
Data:
df <- structure(list(item = c("a_b", "a_c", "a_a", "b_d", "b_f", "b_e"
), mean = c(5L, 2L, 4L, 7L, 3L, 1L)), .Names = c("item", "mean"
), class = "data.frame", row.names = c(NA, -6L))
Notes:
sub('_.+$', '', item)
creates a temporary variable by removing _
and everything after that from item
. _.+$
matches a literal underscore (_
) followed by any character one or more times (.+
) at the end of the string ($
).
str_extract(item, '^._')
creates a temporary variable by extracting any one character (.
) followed by a literal underscore (_
) in the beginning of the string (^
)
The neat thing about dplyr::arrange
is that you can create a temporary sorting variable within the function and not have it included in the output.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments