我有两个数据框data1
,data2
其中包含如下信息:
dput(data1)
structure(list(ProfName = c("Hua (Christine) Xin", "Dereck Barr-Pulliam",
"Lisa M. Blum", "Russell Williamson", "William D. Stout", "Michael F. Wade",
"Sheila A. Johnston", "Julie Huang", "Alan Attaway", "Alan Levitan",
"Benjamin P. Foster", "Carolyn M. Callahan"), Title = c(" PhD",
" PhD", " LLM", " PhD", " PhD", " CPA", " MS", " PhD", " PhD",
" PhD", " PhD", " PhD"), Profession = c("Assistant Professor",
"Assistant Professor", "Instructor", "Assistant Professor", "Associate Professor and Director",
"Instructor", "Instructor", "Associate Professor", "Professor",
"Professor", "Professor", "Brown-Forman Professor of Accountancy"
)), row.names = c(8L, 18L, 25L, 36L, 49L, 50L, 56L, 69L, 71L,
82L, 88L, 89L), class = "data.frame")
它看起来像下面:
dput(data2)
structure(list(ProfName = c("Blandford, K ", "Okafor, A ",
"Johnston, S ", "Rolen, R ", "Attaway, A ", "Xin, H ",
"Huang, Y ", "Stout, W ", "Williamson, R ", "Callahan, C ",
"Foster, B ", "Blum, L ", "Levitan, A ", "Barr-Pulliam, D ",
"Wade, M ")), row.names = c(NA, -15L), class = "data.frame")
data2
如下所示:
我想合并两个数据框,但名称看起来不同。只有特定字符串在具有 column 的两个数据帧之间匹配ProfName
。数据应该被合并,如果名称没有任何信息,它应该是空的。如果他们没有在列的任何信息Title
和Profession
,这两个ProfName
和New
列应具有相同的名称。
我尝试使用merge
,但它没有提供所需的输出。
merge(data1, data2, by="ProfName", all.x=TRUE, all.y = TRUE)
输出应如下所示:
这是一个简单的解决方案:
library(stringr)
library(dplyr)
library(tidyr)
library(magrittr)
data1 %<>% mutate(lname = str_extract(ProfName, "[A-Za-z\\-]+$"))
data2 %<>% mutate(lname = str_extract(ProfName, "^[A-Za-z\\-]+"))
df <- merge(data1, data2, all.y = TRUE, by = "lname")
head(df)
# lname ProfName.x Title Profession # ProfName.y
# 1 Attaway Alan Attaway PhD Professor Attaway, A
# 2 Barr-Pulliam Dereck Barr-Pulliam PhD Assistant Professor Barr-Pulliam, D
# 3 Blandford <NA> <NA> <NA> Blandford, K
# 4 Blum Lisa M. Blum LLM Instructor Blum, L
# 5 Callahan Carolyn M. Callahan PhD Brown-Forman Professor of Accountancy Callahan, C
# 6 Foster Benjamin P. Foster PhD Professor Foster, B
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句