R: How do I split an email into parts when the emails have multiple domain endings?

bergdoktor

I'm trying to analyze a list of emails stored inside a dataframe (data$Email.Address) and I want to start by splitting the emails into parts, so that [email protected], [email protected], and [email protected] end up like this:

   email                 firstpart secondpart thirdpart

1  [email protected]    example1  gmail      com
2  [email protected]  example2  outlook    org
3  [email protected]  example3  comcast    net

With my current code, however, I can't all match all strings — since some include domains like (some-url.com) or (us.army.mil). This means that [email protected] shows up as:

    email                  firstpart secondpart thirdpart
4   [email protected]   example4  us         army

My goal is to read "some-url" or "us.army" as the second part, and "com" and "mil" as the third parts, so that is shows up like this:

    email                  firstpart secondpart thirdpart
4   [email protected]   example4  us.army    mil

Here's the code I have:

library(tidyverse)
library(dplyr)
library(stringr)
library(rebus)

email_pattern <- capture(one_or_more(WRD)) %R%
  "@" %R% capture(one_or_more(x = WRD)) %R% 
  DOT %R% capture(one_or_more(WRD)) 

#Split the emails into parts based on the pattern
email_parts <- str_match(data$Email.Address, pattern = email_pattern)

How can I change the code so that all the domains can be read? Thank you!

sindri_baldur

Using stringi and data.table's tstrsplit():

library(stringi)
library(data.table)
df[paste0("part", 1:3)] <- 
  tstrsplit(stri_replace_last(df$email, fixed = ".", "@"), split = "@")

                 email    part1   part2 part3
1   [email protected] example1   gmail   com
2 [email protected] example2 outlook   org
3 [email protected] example3 comcast   net
4 [email protected] example4 us.army   mil

Reproducible data (please provide yourself next time):

df <- data.frame(
  email = c(
    "[email protected]", "[email protected]", "[email protected]", "[email protected]"
  )
)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How do I create objects to insert in ggplot (when there are multiple parts with +)?

How to split a sentence into multiple parts in Ruby

How can I have multiple Email Actions in Ninja Forms

How do I split a string into three parts(number, digit, space)?

How do I split an and_then chain into two parts?

How do I split multiple times in abap?

How do I send to multiple emails after getting the email from query results?

How do I split multiple columns?

How do I split a column in R into two columns when I have no delimiter?

How do I replace multiple different parts of a string at once?

How do I remove the recipent's name when sending emails?

How do I send email from my domain?

How can I have multiple apache sites under the same domain?

VBA How do i split an integer into separate parts

If I have multiple open tabs how do I merge them into single split tab?

How to split file and save parts to multiple locations?

How do I convert a text file of multiple emails to mbox?

How do I get domain name and number of emails in mysql

how do I retain my imap emails when changing email host providers? export/import?

How do I split an audio file into multiple?

How to split installer data into multiple smaller parts?

Using a dictionary to pull Email Addresses. Some key's have multiple emails, how do I send to all emails?

How do I sort files on multiple filename parts?

How do I filter inbox and only download emails that have attachments

How is the String split() method working when I have multiple of the same delimeters in a row

How do I split parts of a dataframe by a character?

How to listen to incoming emails of multiple email accounts?

How do I reshape into long format when I have multiple 'varying' variables? in R

Python when I run my code, how can I split my screen into parts and have them print out multiple things