Text file (file.txt
) looks like this:
First line.
2. Second line
03 Third line
04. Fourth line
5. Line.
6 Line
Desired output is 1) eliminating numbers at the beginning of line and 2) remove punctuation:
First line.
Second line
Third line
Fourth line
Line.
Line
I tried:
import re
file=open("file.txt").read().split()
print([i for i in file if re.sub("[0-9]\.*", "", i)])
But I get results only on word level instead of line level:
['First', 'line.', 'Second', 'line', 'Third', 'line', 'Fourth', 'line', 'Line.', 'Line']
You may fix your current code using
with open("file.txt") as f:
for line in f:
print(re.sub("^[0-9]+\.?\s*", "", line.rstrip("\n")))
See a Python demo.
You need to open a file and read it line by line. Then, ^[0-9]+\.?\s*
pattern searches for 1 or more digits ([0-9]+
) followed with an optional .
(\.?
) and then 0+ whitespaces (\s*
) on each line and removes the match if found.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments