How to remove unmatched data from two data frames, to create a new data frame in R

Nick

I am creating a graph that correlates the life expectancy age and the state pension age for each country. I have used web scraping packages to scrape 2 datasets from 2 Wikipedia pages.

One of the datasets contains the column "Country" and the other dataset contains the column "Country and regions". This is a problem because both datasets need to merge but are unbalanced due to the regions in the "Country and regions" column.

To solve this, I need to remove the regions in "Country and regions", before merging the datasets, so it is balanced. I need to find the unmatched data from "Country and regions" with "Country", remove it, and create one data frame with the 2 datasets.

library(xml2)
library(rvest)
library(stringr)

urlLifeExpectancy <- "https://en.wikipedia.org/wiki/List_of_countries_by_life_expectancy"

extractedLifeData = urlLifeExpectancy %>%
  read_html() %>%
  html_node(xpath = '//*[@id="mw-content-text"]/div/table[1]') %>%
  html_table(fill = TRUE)

urlPensionAge <- "https://en.wikipedia.org/wiki/Retirement_age#Retirement_age_by_country"

extractedPensionData = urlPensionAge %>%
  read_html() %>%
  html_node(xpath = '//*[@id="mw-content-text"]/div/table[3]') %>%
  html_table(fill = TRUE)
Ronak Shah

We can use merge by selecting the columns which we need from both the datasets

merge(extractedLifeData[c(1, 5, 7)], extractedPensionData[1:3], 
       by.y = "Country", by.x = "Country and regions")

Or use inner_join from dplyr

library(dplyr)

extractedLifeData %>% select(1, 5, 7) %>%
     inner_join(extractedPensionData %>% select(1:3), 
                by = c("Country and regions" = "Country"))

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to remove \n from data frame and move the data to new row

Return two data frames from a function with data frame format

Remove data frame with error from list of data frames in R

How can I plot two series from different data frames against each other with ggplot2 in R without building a new data frame?

Split all columns in one data frame and create two data frames in R

Looping over a list of pandas data frames and create a new data frame

Remove specific data frame from a list of data frames in R

Create new data frame from list of data frames using lapply

R - How to create a data frame from a data frame with conditions?

how to create a new list of data.frames by systematically rearranging columns from an existing list of data.frames

How to concatenate data frames from two different dictionaries into a new data frame in python?

How to create a new table that summarises data from another data frame?

R: how to plot data from two separate data.frames

Collecting data from two data frames to store in a new data frame

create a multiple data frames from a single data frame in r

how to create data frames in r

R Efficient way to create new data frame from unique rows between two data frames

r remove outliers from a list of data.frames and make a new list of data.frames?

Want to create new data frames in R by subsetting a data frame in a loop and assign each data frame name based on i value

How to select from data frame using column orders stored in two another data frames in R?

R: match two data frames and create new column in one

How to compare two data frame and get the unmatched rows using python?

R Function to Create Custom Data Frames from Larger Data Frame

Combining a list of data frames into a new data frame in R

How to create new columns in a new data frame from information in an existing data frame in R

R: Create new data frame or Matrix from two data frames

how can I create a new data frame using exact rows from the old data frame in R Studio?

How do I create multiple new data frames in R, derived from a single data frame and named sequentially?

How to create a new column in R from data in another data frame?