I am trying to count the occurrences of each word in a file, such that the output is something like
the: 102
me: 100
etc
Here is the code I have so far.
from pathlib import Path
from collections import Counter
import string
filepath = Path('input.txt')
with open(filepath) as f:
content = f.readlines()
word_list = sum((
(s.strip('\n').translate(str.maketrans('', '', string.punctuation))).split(' ')
for s in content
), [])
for key,value in Counter(word_list).items():
print(f'{key} : {value}')
However, this takes infinite amount of time when the input file is large. How do I make this workable for large files?
Changed f.readlines()
to directly iterating over f and sum
to list.extend
in a loop.
from pathlib import Path
from collections import Counter
import string
filepath = Path('input.txt')
with open(filepath) as f:
word_list = []
for s in f:
word_list.extend((s.strip('\n').translate(str.maketrans('', '', string.punctuation))).split(' '))
for key,value in Counter(word_list).items():
print(f'{key} : {value}')
works almost instantly on my test file.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments