I'm trying to extratct in Tableau the first occurance of part of speech name (e.g. subst, adj, fin) located between {
and :
in every line from column below:
{subst:pl:nom:m3=18, subst:pl:voc:m3=1, subst:pl:acc:m3=5}
{subst:sg:gen:m3=5, subst:sg:inst:m3=1, subst:sg:gen:f=1, subst:sg:nom:m3=1}
{subst:sg:nom:f=3, subst:sg:loc:f=2, subst:sg:inst:f=1, subst:sg:nom:m3=1}
{adj:sg:nom:m3:pos=2, adj:sg:acc:m3:pos=1, adj:sg:acc:n1.n2:pos=3, adj:pl:acc:m1.p1:pos=3, adj:sg:nom:f:pos=1}
{adj:sg:gen:f:pos=2, adj:sg:nom:n:pos=1}
{fin:sg:ter:imperf=5}
To do this I use the following regular expression: {(\w+):(?:.*?)}$
. Unfortunately my calculated field returns only Null's:
I checked my regular expression on regex tester and is valid:
I don't know what I'm doing wrong so if anybody has any suggestions I would be greatfull.
Tableau regex engine is ICU, and there are some differences between it and PCRE.
One of them is that braces that should be matched as literal symbols must be escsaped.
Your regex also contains a redundant non-capturing group ((?:.*?)
= .*?
) and a lazy quantifier that slows down matching since you want to check for a }
at the end of the string, and thus should be changed to a greedy .*
.
You can use
REGEXP_EXTRACT([col], '^\{(\w+):.*\}$')
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments