我在这里有些困惑,这不是正确的方法,或者我缺少了left_join的一部分:
我希望按国家和年份加入“ gdp”列,并在所有三个“性别”类别中重复该值,以使同一年的所有三个性别都具有相同的关联gdp。
这是我现在拥有的:
library(tidyverse)
table_1 <- tribble(~"Region",~"Country",~"Year", ~"Gender", ~"median_rate",
"Central and Southern Asia", "Afghanistan", 2011, "female", 0.186,
"Central and Southern Asia","Afghanistan", 2011, "male", 0.454,
"Central and Southern Asia", "Afghanistan", 2011, "total", 0.274,
"Central and Southern Asia", "Afghanistan", 2018, "female", 0.221,
"Central and Southern Asia", "Afghanistan" , 2018, "male", 0.504,
"Central and Southern Asia", "Afghanistan", 2018, "total", 0.367)
table_2 <- tribble(~"Country", ~"gdp", ~"Year",
"Afghanistan", 551., 2010,
"Afghanistan", 599.,2011,
"Afghanistan", 649., 2012,
"Afghanistan", 648., 2013,
"Afghanistan", 625., 2014,
"Afghanistan", 590., 2015,
"Afghanistan", 550., 2016,
"Afghanistan", 550., 2017)
table_1 %>% left_join(table_2, by = "Country")
# A tibble: 48 x 7
Region Country Year.x Gender median_rate gdp Year.y
<chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 Central and Southern Asia Afghanistan 2011 female 0.186 551 2010
2 Central and Southern Asia Afghanistan 2011 female 0.186 599 2011
3 Central and Southern Asia Afghanistan 2011 female 0.186 649 2012
4 Central and Southern Asia Afghanistan 2011 female 0.186 648 2013
5 Central and Southern Asia Afghanistan 2011 female 0.186 625 2014
6 Central and Southern Asia Afghanistan 2011 female 0.186 590 2015
7 Central and Southern Asia Afghanistan 2011 female 0.186 550 2016
8 Central and Southern Asia Afghanistan 2011 female 0.186 550 2017
9 Central and Southern Asia Afghanistan 2011 male 0.454 551 2010
10 Central and Southern Asia Afghanistan 2011 male 0.454 599 2011
# ... with 38 more rows
期望的输出将是这样的,表2中的gdp列已加入,但仅适用于每个匹配的年份(例如,表1中仅提供了2011年和2018年的数据,因此应仅匹配这些年份)
tribble(~"Region",~"Country",~"Year", ~"Gender", ~"median_rate",~"gdp",
"Central and Southern Asia", "Afghanistan", 2011, "female",0.186, 550,
"Central and Southern Asia","Afghanistan", 2011, "male",0.454,550,
"Central and Southern Asia", "Afghanistan", 2011, "total",0.274,550,
"Central and Southern Asia", "Afghanistan", 2018, "female", 0.221,590,
"Central and Southern Asia", "Afghanistan" , 2018, "male", 0.504, 590,
"Central and Southern Asia", "Afghanistan", 2018, "total", 0.367, 590)
谢谢你的帮助,
dplyr
'join verbs'by=
参数可以接受多个列:
table_1 <- tribble(~"Region",~"Country",~"Year", ~"Gender", ~"median_rate",
"Central and Southern Asia", "Afghanistan", 2011, "female", 0.186,
"Central and Southern Asia","Afghanistan", 2011, "male", 0.454,
"Central and Southern Asia", "Afghanistan", 2011, "total", 0.274,
"Central and Southern Asia", "Afghanistan", 2018, "female", 0.221,
"Central and Southern Asia", "Afghanistan" , 2018, "male", 0.504,
"Central and Southern Asia", "Afghanistan", 2018, "total", 0.367)
table_2 <- tribble(~"Country", ~"gdp", ~"Year",
"Afghanistan", 551., 2010,
"Afghanistan", 599.,2011,
"Afghanistan", 649., 2012,
"Afghanistan", 648., 2013,
"Afghanistan", 625., 2014,
"Afghanistan", 590., 2015,
"Afghanistan", 550., 2016,
"Afghanistan", 550., 2017)
table_1 %>% left_join(table_2, by = c("Country", "Year"))
# # A tibble: 6 x 6
# Region Country Year Gender median_rate gdp
# <chr> <chr> <dbl> <chr> <dbl> <dbl>
# 1 Central and Southern Asia Afghanistan 2011 female 0.186 599
# 2 Central and Southern Asia Afghanistan 2011 male 0.454 599
# 3 Central and Southern Asia Afghanistan 2011 total 0.274 599
# 4 Central and Southern Asia Afghanistan 2018 female 0.221 NA
# 5 Central and Southern Asia Afghanistan 2018 male 0.504 NA
# 6 Central and Southern Asia Afghanistan 2018 total 0.367 NA
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句