Using the code below in Jupyter notebook, I can only produce a count of each character found. But I am looking to get a count on the number of times each word occurs. Thank you!
from bs4 import BeautifulSoup as Soup, Tag
import re
import requests
from collections import Counter
url = "http://en.wikipedia.org/wiki/October_27"
DayBorn = [] # create a list to save the soup contents
response = requests.get(url)
soup = Soup(response.content)
births_span = soup.find("span", {"id": "Births"}) # find where the first instance of span with ID of births appears
births_ul = births_span.parent.find_next_sibling() # find the parents next sibling which is ul (unordered list)
for item in births_ul.findAll('li'): # find all the occurrences of li within births_ul
if isinstance(item, Tag):
#print(item.text) # if the next item found is a 'li' then print the value of its text
DayBorn.append(item.text)
This next section gives me a list of each word as it occurs.
text_iterated = str(DayBorn)
[x for x,y in re.findall(r'((\w+[^,.()]))', text_iterated)]
I have tried both these methods so far
Counter(str(text_iterated))
and
occurrences = Counter()
for word in str(DayBorn):
occurrences[word] += 1
occurrences
They result in the same thing, a count of each number/letter e.g.
counter({'[': 4,
"'": 449,
'8': 104,
'9': 277,
'2': 109,
' ': 2237,
'–': 225,
'E': 50,
You very specifically told your program to iterate through the characters of the list you created:
for word in str(DayBorn):
You converted the list to its string-output form, and then iterated through the characters of that string. Instead,
for word in DayBorn:
Better yet, simply use the provided Python facility for counting:
from collections import Counter
...
occurrences = Counter(DayBorn)
EDIT per USER COMMENT
DayBorn needs to be a list of words. Again, we need your MVE. Perhaps this will help as you ingest lines: instead of adding the entire line to your list
DayBorn.append(item.text)
... add the words individually
DayBorn.extend(item.text.split())
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments