我如何获得条件中R中多列的中位数（根据另一列）

Ayla 发表于 Dev

艾拉

我是R的初学者，我想知道如何执行以下任务：

我想用数据集所有列的中位数替换数据集的缺失值。但是，对于每一列，我想要某个类别的中位数（取决于另一列）。我的数据集如下

structure(list(Country = structure(1:5, .Label = c("Afghanistan", 
"Albania", "Algeria", "Andorra", "Angola"), class = "factor"), 
    CountryID = 1:5, Continent = c(1L, 2L, 3L, 2L, 3L), Adolescent.fertility.rate.... = c(151L, 
    27L, 6L, NA, 146L), Adult.literacy.rate.... = c(28, 98.7, 
    69.9, NA, 67.4)), class = "data.frame", row.names = c(NA, 
-5L))

因此，对于每个列，我想用特定大陆中值的中位数替换缺失值。

达里奥

我们可以用每组（中位数和非数字列）中的sdplyr::mutate_at替换为其组的中位数NAContinentCountryContinent

df <- structure(list(Country = structure(1:5, .Label = c("Afghanistan",  "Albania", "Algeria", "Andorra", "Angola"), class = "factor"), 
               CountryID = 1:5, Continent = c(1L, 2L, 3L, 2L, 3L),
               Adolescent.fertility.rate.... = c(151L, 27L, 6L, NA, 146L),
               Adult.literacy.rate.... = c(28, 98.7, 69.9, NA, 67.4)), class = "data.frame", row.names = c(NA, -5L))

library(dplyr)
df %>%
  group_by(Continent) %>% 
  mutate_at(vars(-group_cols(), -Country), ~ifelse(is.na(.), median(., na.rm = TRUE), .)) %>% 
  ungroup()

返回值：

  # A tibble: 5 x 5
    Country     CountryID Continent Adolescent.fertility.rate.... Adult.literacy.rate....
    <fct>           <int>     <int>                         <int>                   <dbl>
  1 Afghanistan         1         1                           151                    28  
  2 Albania             2         2                            27                    98.7
  3 Algeria             3         3                             6                    69.9
  4 Andorra             4         2                            27                    98.7
  5 Angola              5         3                           146                    67.4

说明：首先，我们组data.framedf通过Continent。然后，按以下方式对除分组列（Country而非数字列）以外的所有列进行变异：如果is.na为TRUE，则将其替换为中位数，并且由于已分组，因此它将成为该Continent组的中位数（如果不是）NA我们将其替换为自身）。最后，我们呼吁ungroup采取良好的措施来恢复“正常”小事。