有时我会尝试创建一个涵盖整个问题的标题,但是我在表达问题时遇到了一些困难,并且想直接举例说明我要完成的工作。首先,我的数据框的子集,其中包括一些体育数据:
dput(mydf)
structure(list(team.Abbreviation = c("ATL", "BOS", "BRO", "CHA",
"CHI", "ATL", "BOS", "BRO", "CHA", "CHI", "ATL", "BOS", "BRO",
"CHA", "CHI"), stat = c("GP", "GP", "GP", "GP", "GP", "PTS",
"PTS", "PTS", "PTS", "PTS", "REB", "REB", "REB", "REB", "REB"
), value = c(28, 30, 27, 27, 27, 103.5, 103.9, 108.2, 104.7,
97.6, 47.6, 53, 54.7, 56.8, 51.7), foragainst = c("for", "for",
"for", "for", "for", "for", "for", "for", "for", "for", "for",
"for", "for", "for", "for")), .Names = c("team.Abbreviation",
"stat", "value", "foragainst"), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
mydf
# A tibble: 15 x 4
team.Abbreviation stat value foragainst
<chr> <chr> <dbl> <chr>
1 ATL GP 28.0 for
2 BOS GP 30.0 for
3 BRO GP 27.0 for
4 CHA GP 27.0 for
5 CHI GP 27.0 for
6 ATL PTS 103.5 for
7 BOS PTS 103.9 for
8 BRO PTS 108.2 for
9 CHA PTS 104.7 for
10 CHI PTS 97.6 for
11 ATL REB 47.6 for
12 BOS REB 53.0 for
13 BRO REB 54.7 for
14 CHA REB 56.8 for
15 CHI REB 51.7 for
目前,foragainst列可以忽略。对于每个统计信息(在这种情况下为GP,PTS,REB),我想计算该统计中每个团队的排名。此示例中有5个团队。我相当确定我想要的是一个尺寸与mydf相同的数据框,看起来像这样:
outputdf
# A tibble: 15 x 4
team.Abbreviation stat rank foragainst
<chr> <chr> <dbl> <chr>
1 ATL GP 2 for
2 BOS GP 1 for
3 BRO GP 3 for
4 CHA GP 3 for
5 CHI GP 3 for
6 ATL PTS 4 for
7 BOS PTS 3 for
8 BRO PTS 1 for
9 CHA PTS 2 for
10 CHI PTS 5 for
11 ATL REB 5 for
12 BOS REB 3 for
13 BRO REB 2 for
14 CHA REB 1 for
15 CHI REB 4 for
检查此数据的5行切片,其中stat == PTS,请注意team.Abbrevation == BRO具有最高的PTS数量,因此其排名为1。CHI具有最低的PTS数量,因此其排名为5。我并不特别在意关系的处理方式,因此对于统计== GP的BRO,CHA和CHI排名不一定等于3。
我可能可以通过for循环以相当低效的方式完成此操作,但是我想在这里找到dplyr(或其他好的软件包)解决方案。提前致谢!
我们可以用 min_rank
library(dplyr)
mydf %>%
group_by(stat) %>%
mutate(rank = min_rank(-value)) %>%
select(team.Abbreviation, stat, rank, foragainst)
# A tibble: 15 x 4
# Groups: stat [3]
# team.Abbreviation stat rank foragainst
# <chr> <chr> <int> <chr>
# 1 ATL GP 2 for
# 2 BOS GP 1 for
# 3 BRO GP 3 for
# 4 CHA GP 3 for
# 5 CHI GP 3 for
# 6 ATL PTS 4 for
# 7 BOS PTS 3 for
# 8 BRO PTS 1 for
# 9 CHA PTS 2 for
#10 CHI PTS 5 for
#11 ATL REB 5 for
#12 BOS REB 3 for
#13 BRO REB 2 for
#14 CHA REB 1 for
#15 CHI REB 4 for
或使用ave
从base R
with(mydf, ave(-value, stat, FUN = function(x) rank(x, ties.method = "min")))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句