我有一个包含九个分类变量的数据框 (df),第一个被称为学生,然后是八个学校科目的名称。
我想创建一个名为整体的新变量,总结学生学习的科目(dfgoal)。
问题是我所拥有的不起作用。此外,我不确定如何最好地跳过第一列(学生)。使用我想要使用的变量列表(八个主题)?
任何帮助将非常感激。
起点(df):
df <-
data.frame(
student = c(1, 2, 3, 4, 5),
maths = c("y", "n", "n", "n", "n"),
English = c("n", "y", "n", "n", "n"),
geography = c("y", "n", "n", "n", "n"),
history = c("n", "n", "n", "n", "n"),
art = c("n", "n", "n", "n", "n"),
Spanish = c("n", "n", "n", "n", "n"),
physics = c("n", "n", "n", "n", "y"),
chemistry = c("n", "n", "n", "n", "y"),
stringsAsFactors = TRUE
)
预期结果(dfgoal):
dfgoal <-
data.frame(
student = c(1, 2, 3, 4, 5),
maths = c("y", "n", "n", "n", "n"),
English = c("n", "y", "n", "n", "n"),
geography = c("y", "n", "n", "n", "n"),
history = c("n", "n", "n", "n", "n"),
art = c("n", "n", "n", "n", "n"),
Spanish = c("n", "n", "n", "n", "n"),
physics = c("n", "n", "n", "n", "y"),
chemistry = c("n", "n", "n", "n", "y"),
overall = c("maths, geography,", "English", "n", "n", "physics,chemistry,"),
stringsAsFactors = TRUE )
当前代码:
sapply(df, function(x)
df$overall <- ifelse(df$x == y, paste0(names(df$x), ","), "n"))
在单线中:
dfgoal <- cbind.data.frame(
df,
overall = apply(df, 1, function(x)
paste(colnames(df[-1])[x[2:length(x)] == "y"], collapse = ", ")))
dfgoal;
# student maths English geography history art Spanish physics chemistry
#1 1 y n y n n n n n
#2 2 n y n n n n n n
#3 3 n n n n n n n n
#4 4 n n n n n n n n
#5 5 n n n n n n y y
# overall
#1 maths, geography
#2 English
#3
#4
#5 physics, chemistry
如果你还想用 替换空字符串"n"
,你可以这样做
levels(dfgoal$overall)[levels(dfgoal$overall) == ""] <- "n";
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句