我有三个大数据框,我想根据几个标准将一些元素从一个元素附加到另一个元素。我在 Stack Overflow 中查找了类似的问题,但它们似乎不适用于我的数据帧格式(或者我不够熟练,无法正确调整)。
需要发生的是:
maindf1 是原始数据,其中每一行都是一个人,列是调查响应或收集的个人数据
我必须使用的人口普查网站上的查找表采用了奇怪的格式,因此对我来说解决其中一个问题的最简单的解决方案是首先按性别分隔查找表。
我在编写成功的代码方面没有运气,因为我还没有在 R 中编码的经验。我尝试了一些 for & if 循环,但未能为此任务调整模糊连接代码。我感谢您的帮助!
示例数据:
ZCTA<- c("12345", "NA", "NA", "44444", "99999", "11111" )
sex <- c("female", "male", "male", "male", "female", "male")
agegrp <- c("pop_0to4", "pop_70to74", "pop_25to29", "pop_70to74","pop_70to74","pop_25to29")
maindf1 <- data.frame(ZCTA, sex, agegrp)
ZCTA<- c("12345", "23456", "12225", "44444", "99999", "11111" )
pop_0to4 <- c("2000", "1300", "900", "737", "289", "120")
pop_70to74 <- c("25", "222", "52", "160", "100", "80")
pop_25to29 <- c("3000", "2500", "102", "1777", "3390", "2450")
maledflookup<- data.frame(ZCTA, pop_0to4, pop_25to29, pop_70to74)
ZCTA<- c("12345", "23456", "12225", "44444", "99999", "11111" )
pop_0to4 <- c("1111", "2333", "999", "888", "222", "122")
pop_70to74 <- c("18", "333", "66", "300", "90", "99")
pop_25to29 <- c("3333", "2555", "111", "2777", "3311", "2121")
femaledflookup <- data.frame(ZCTA, pop_0to4, pop_25to29, pop_70to74)
数据和查找表看起来像(2000 行):
#maindf1
#ZCTA #sex #agegrp
12345 female pop_0to4
NA male pop_70to74
NA male pop_25to29
44444 male pop_70to74
99999 female pop_70to74
11111 male pop_25to29
#maledflookup
#ZCTA #pop_0to4 #pop_25to29 #pop_70to74
12345 2000 3000 25
23456 1300 2500 222
12225 900 102 52
44444 737 1777 160
99999 289 3390 100
11111 120 2450 80
#femaledflookup
#ZCTA #pop_0to4 #pop_25to29 #pop_70to74
12345 1111 3333 18
23456 2333 2555 333
12225 999 111 66
44444 888 2777 300
99999 222 3311 90
11111 122 2121 99
期望的结果:
#maindf1
#ZCTA #sex #agegrp #censuspop
12345 female pop_0to4 1111
NA male pop_70to74 NA
NA male pop_25to29 NA
44444 male pop_70to74 160
99999 female pop_70to74 90
11111 male pop_25to29 2450
使用left_join
tidyverse 和格式正确的查找表:
library(tidyverse)
.maledflookup <- maledflookup %>%
gather(-ZCTA, key = agegrp, value = censuspop) %>%
mutate(sex = "male")
.femaledflookup <- femaledflookup %>%
gather(-ZCTA, key = agegrp, value = censuspop) %>%
mutate(sex = "female")
.lookup <- bind_rows(.maledflookup, .femaledflookup)
left_join(maindf1, .lookup, by = c("sex", "ZCTA", "agegrp"))
gather
,以获得与列的数据帧ZCTA
,agegrp
和censuspop
。还为性别添加一个新列。bind_rows
ZCTA
,agegrp
和sex
。本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句