I have this regex_func helper function below that has been working well to extract a match from a df column using map and lambda.
def regex_func(regex_compile,x,item=0,return_list=False):
"""Function to handle list returned by re.findall()
Takes the first value of the list.
If empty list, returns empty string"""
match_list = regex_compile.findall(x)
if return_list:
match = match_list
elif match_list:
try:
match = match_list[item]
except:
match = ""
else:
match = ""
return match
#Working example
regex_1 = re.compile('(?i)(?<=\()[^ ()]+')
df['colB'] = df['colA'].map(lambda x: regex_func(regex_1, x))
I am having trouble doing a similar task. I want the regex to be based on a value in another column and then applied. One method I was trying that did not work:
# Regex should be based on value in col1
# Extracting that value and prepping to input into my regex_func()
value_list = df['col1'].tolist()
value_list = ['(?i)(?<=' + d + ' )[^ ]+' for d in value_list]
value_list = [re.compile(d) for d in value_list]
# Adding prepped list back into df as col2
df.insert(1,'col2',value_list)
#Trying to create col4, based on applying my re.compile in col 2 to a value in col3.
df.insert(2,'col4', df['col3'].map(lambda x: df['col2'],x)
I understand why the above doesn't work, but have not been able to find a solution.
You can zip
the columns and then build the regex on the fly:
df['colB'] = [regex_func('(?i)(?<=' + y + ' )[^ ]+', x)
for x, y in zip(df['colA'], df['col1'])]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments