请帮帮我...我有一个数据集,其中包含每年有关学位的信息,例如:
Year1 Deg_Year1 Year2 Deg_Year2 Year3 Deg_Year3 Year4 Deg_Year4 Year5 Deg_Year5
2001 College 2004 Master NA NA NA NA NA NA
2004 College 2004 Master 2010 PHD NA NA NA NA
2006 Master 2006 College NA NA NA NA NA NA
2016 Master NA NA NA NA NA NA NA NA
2002 Master 2003 Master 2004 College 2004 Master NA NA
2002 Master 2002 College NA NA NA NA NA NA
我想获得一个包含2015年之前获得的年份和最高学历的数据框,如下所示:
YearX Highest_Degree
2004 Master
2010 PHD
2006 Master
NA NA
2004 Master
2002 Master
有人可以帮帮我吗?
谢谢 !
我们可以vector
按顺序创建一个度数,然后match
针对“ Deg_Year”列进行获取,获取每一行的最大值,max.col
以将值和每一行中的对应“ Year”作为子集
v1 <- c('Master', 'PHD')
nm1 <- grep('Deg', names(df1))
m1 <- sapply(df1[nm1], match, table = v1, nomatch = 0)
i1 <- max.col(m1) * NA^(!rowSums(m1!=0))
YearX <- df1[nm1-1][cbind(seq_len(nrow(df1)), i1)]
Highest_Degree <- df1[nm1][cbind(seq_len(nrow(df1)), i1)]
data.frame(YearX, Highest_Degree)
# YearX Highest_Degree
#1 2004 Master
#2 2010 PHD
#3 2006 Master
#4 NA <NA>
#5 2004 Master
#6 2002 Master
df1 <- structure(list(Year1 = c(2001L, 2004L, 2006L, 2016L, 2002L, 2002L
), Deg_Year1 = c("College", "College", "Master", "College", "Master",
"Master"), Year2 = c(2004L, 2004L, 2006L, NA, 2003L, 2002L),
Deg_Year2 = c("Master", "Master", "College", NA, "Master",
"College"), Year3 = c(NA, 2010L, NA, NA, 2004L, NA), Deg_Year3 = c(NA,
"PHD", NA, NA, "College", NA), Year4 = c(NA, NA, NA, NA,
2004L, NA), Deg_Year4 = c(NA, NA, NA, NA, "Master", NA),
Year5 = c(NA, NA, NA, NA, NA, NA), Deg_Year5 = c(NA, NA,
NA, NA, NA, NA)), .Names = c("Year1", "Deg_Year1", "Year2",
"Deg_Year2", "Year3", "Deg_Year3", "Year4", "Deg_Year4", "Year5",
"Deg_Year5"), class = "data.frame", row.names = c(NA, -6L))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句