Match by argument pattern and left join

Chris T.

I am trying to match city/county names (fortunately information on state names are provided) with their corresponding state names and then append their first three phone digit as another column using left_join(). My initial thought would be to replicate the city/county name column and then replacing them with their state names using sapply() along with grep(), then using left_join() to merge it with phone digit column, but it doesn't seem that my code works.

library(dplyr)

location <- data.frame(location = c('Asortia, New York', 'Buffalo, New York', 'New York, New York',  'Alexandra, Virginia', 'Fairfax, Virginia', 'Baltimore, Maryland', 'Springfield, Maryland'), number = c(100, 200, 300, 400, 500, 600, 700))

state <- data.frame(state = c('New York', 'Virginia', 'Maryland'))

sapply(as.character(state$state), function(i) grep(i, location$location))

### doesn't work! ###
### my desired output would be ###

  location number
1 New York    100
2 New York    200
3 New York    300
4 Virginia    400
5 Virginia    500
6 Maryland    600
7 Maryland    700
Such that I could use left_join to merge the output generated from above with their three digit phone number. For example,

df <- location
names(df)[1] <- 'state'
digit <- data.frame(state = c('New York', 'Virginia', 'Maryland'), digit = c(212, 703, 410))
   
new_df <- left_join(df, digit, by = 'state')

### the desired output ###

  location number digit
1 New York    100   212
2 New York    200   212
3 New York    300   212
4 Virginia    400   703
5 Virginia    500   703
6 Maryland    600   410
7 Maryland    700   410

I have referenced this and this thread, but didn't quite get the clue. Hope someone could help me on this.

## Update

I found that using grepl in a for loop also works, but the processing may be slow if you have large amount of data (the data I'm working on has two million observations).

for (i in state$state) { 
location$location[grepl(i, location$location)] <- i
}

akrun

May be we can use str_remove by pasteing (str_c) the pattern vector in 'state' column from 'state' dataset as a regex lookaround to match anything that precedes the vector (to remove)

library(stringr)
library(dplyr)
location %>%
    mutate(location = str_remove(location, str_c(".*(?=(",
            str_c(state$state, collapse  = "|"), "))")))
#  location number
#1 New York    100
#2 New York    200
#3 New York    300
#4 Virginia    400
#5 Virginia    500
#6 Maryland    600
#7 Maryland    700

Or another option is to separate into two column and remove the first

library(tidyr)
location %>%
   separate(location, into = c('unwanted', 'location'), sep=",\\s*") %>% 
   select(-unwanted)

Or if we have a specific pattern, remove the prefix part by matching one or more characters that are not a , from the start (^) followed by , and zero or more spaces (\\s*) as pattern in the str_remove

location %>% 
    mutate(location = str_remove(location, '^[^,]+,\\s*'))

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Pattern match in function argument in haskell

MySQL Optional LEFT JOIN With MATCH

binding via argument name and pattern match in ocaml

Matching type of head/tail pattern match argument

Pattern match on end of a string/binary argument

How to refactor a function replacing an argument pattern match

Parameterize pattern match as function argument in R

How to pattern match a script argument in Linux bash

LEFT JOIN ON string match Very slow

SQL Force Left Join to Print No Match

Fuzzy Left Join exact + partial string match

Pandas left join with wildcard string match

Left Outer Join with just one Match

LEFT JOIN only one match per ID

Notepad++ join lines following pattern match

SQL LEFT JOIN with possible join condition duplicate match

Join - Left join with a filter on both tables, return zero if no match

left_join using a charater vector in the by argument in R

left join to a left join

Is there any way in Elixir to pattern match on a nil argument passed to a function?

Using LEFT JOIN to returns rows that don't have a match

MySql left join on first match and keep all of primary table data

Select values if doesn't match in other table by left join

how to create a match flag for left join in data.table

How to left join with or operator and determined only one column that match

mysql - LEFT JOIN two tables with columns that almost match

Using an Inner Join and returning only 1 record per match on the left

How to match on several rows using left inner join on mysql

MySQL Update rows with double left join, limiting first match