Why can't I find this string in RegEx?

Max FH
lines = []
total_check = 0

with pdfplumber.open(file) as pdf:
    pages = pdf.pages
    for page in pdf.pages:
        text = page.extract_text()
        for line in text.split('\n'):
            print(line)

output data:

Totaalbedrag excl. btw € 25,00

When I try to retrieve VAT from data:

KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(data).group(0)

output: AttributeError: 'NoneType' object has no attribute 'group'

KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(r'excl. btw € 25,00').group(0)

output: 'excl. btw € 25,00'

How is it possible that when I paste the literal output in a search it does find the number € 25,00 and when I enter the data variable it does not?

Please help me!

Wiktor Stribiżew

In most cases, when a literal space is used in the pattern and there is no match, the reason is the invisible characters, or non-breaking spaces.

When you have non-breaking spaces, \xA0, you can simply replace the literal spaces with \s to match any whitespace, or [ \xA0] to match either of the spaces.

It appears there may be a combination of both spaces and some invisible chars in this case, thus, you may use \W to match any non-word chars instead of a literal space:

r'excl\.\W+btw\W.+'

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related