Why can't I find this string in RegEx?

Max FH Published at Dev

Max FH

lines = []
total_check = 0

with pdfplumber.open(file) as pdf:
    pages = pdf.pages
    for page in pdf.pages:
        text = page.extract_text()
        for line in text.split('\n'):
            print(line)

output data:

Totaalbedrag excl. btw € 25,00

When I try to retrieve VAT from data:

KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(data).group(0)

output: AttributeError: 'NoneType' object has no attribute 'group'

KVK_re = re.compile(r'(excl. btw .+)')
KVK_re.search(r'excl. btw € 25,00').group(0)

output: 'excl. btw € 25,00'

How is it possible that when I paste the literal output in a search it does find the number € 25,00 and when I enter the data variable it does not?

Please help me!

Wiktor Stribiżew

In most cases, when a literal space is used in the pattern and there is no match, the reason is the invisible characters, or non-breaking spaces.

When you have non-breaking spaces, \xA0, you can simply replace the literal spaces with \s to match any whitespace, or [ \xA0] to match either of the spaces.

It appears there may be a combination of both spaces and some invisible chars in this case, thus, you may use \W to match any non-word chars instead of a literal space:

r'excl\.\W+btw\W.+'

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-19

Comments

0 comments

TOP Ranking

Article

Why can't I find this string in RegEx?

Why can't I find this string in RegEx?

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Double spacing in rmarkdown pdf

SQL Server : need add a dot before two last character

C++ 16 bit grayscale gradient image from 2D array

JMeter: Why get error when try to save test plan

JWT gives JsonWebTokenError "invalid token"

How to make thrown errors visible outside of a Promise?

How to tell if iOS Today Widget is being updated in the background?

Calling Doctrine clear() with an argument is deprecated

Capybara Selenium Chrome opens About Google Chrome

How to update azerothcore-wotlk docker container

Adding Ripple Effect to RecyclerView item

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

Error while applying filter on dataframe - PySpark

Unable to add slack to bluemix project

MyPy fails dataclass argument with optional list of objects type

How can I validate and parse phone numbers to extract their country calling code and area code?

Single Sign-On in Spring by using SAML Extension and Shibboleth

python how to create many-to-many of lists inside one list