I'm trying to extract the street number from addresses such as this:
1520 32nd Street
3215 Sheldon Rd
But replacing not digits, gets the 32 out the first.
python> re.sub(r'\D', '', street)
152032
3215
I'm pretty sure I need a negative look around, but I can't get it right.
It looks like your task would be easier if you were first to engage in the process of address normalization: converting addresses into a standard format with well-defined fields. There are various tools out there for doing this; the usaddress
module seems to work well for US addresses.
>>> import usaddress
>>> addr = usaddress.tag('1520 32nd St')
>>> addr[0]['AddressNumber']
'1520'
And for your second address:
>>> addr = usaddress.tag('3215 Sheldon Rd')
>>> addr[0]['AddressNumber']
'3215'
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments