For example, if I give a string 'aaaaaabbbbcccc'
and n = 2
, it should return [('a',6),('b',4)]
. I have tried it in this way:
def top_chars(word, n):
list1=list(word)
list3=[]
list2=[]
list4=[]
set1=set(list1)
list2=list(set1)
def count(item):
count=1
for x in word:
if x in word:
count+=item.count(x)
list3.append(count)
return count
list2.sort(key=count)
list3.sort()
list4=list(zip(list2,list3))
list4.reverse()
list4.sort(key=lambda list4: ((-list4[1]),(list4[0])), reverse=True)
return list4[0:n]
But for top_chars ("app", 1)
it was returning the output [('p', 1)]
, in which the count is wrong.
As you want to do this without using collections.Counter
, let’s go through your code for a bit first:
list1=list(word)
set1=set(list1)
list2=list(set1)
Apart from the very non-descriptive names (which you should avoid), you can essentially compact that to characters = list(set(word))
to get a list of all characters in the word without duplicates. As word
is a string, calling set()
on it will automatically iterate the string character by character.
Next, in your count
function you check if x in word
although you are explicitely interating over the x in word
. So whatever x
will be, it will always be in word
. So you can leave that check out. And then, when you actually increase the count, you increment it by the occurences of x
—which are all characters in the original word—inside of the passed item
. As you use the function as a sortkey for the list of (unique) characters of the word, you will essentially count the occurences of every character of the word in the single-character string which represents the current unique character. So with both item
and x
being a single character, you actually increment the count
by one, if x == item
—albeit in a very weird way. And next—still within the loop over the original characters—you append that count to list3
and return the count.
So what happened so far? You looked at the very first character of word
and checked if item
equals the character. If that’s the case, you increment count
from 1
to 2
, otherwise you leave it at 1
. And then you just keep that count
and return from the function. As such, you never actually look at the other characters of the word, which explains why your count is always 1 or 2.
If you kept the loop running, and only returned (and appended the count) afterwards then it works correctly (i.e. remove two indentation levels from those two lines). However, as you start with count = 1
, you’re always one too many, so you should start at 0.
Moving on, you now sort both list2
and list3
so by the same values so they line up. By construction of the lists, this luckily works, but it still is somewhat weird. You then zip the sorted lists up and reverse the order. And then you do something which I don’t really understand: You take the list which is already reverse-sorted by the character count, and sort it reversed by the negative count. Reverse-sorting the negative count is essentially normal-sorting by the positive count; so you sort it ascending. This will give you the inverse result of what you want, and isn’t actually necessary either, as the list is already sorted after being zipped.
Anyway, there is quite a lot to improve with your code. First of all, even if you are not using a counter, you should still use a dictionary to store the counts. I assume that you may not use a defaultdict
either, so we’ll just build the functionality on our own. Instead of creating a list of all the characters in the word and then counting how often that character occurs in the word, we will just loop once through the word and keep note of every character we see:
counts = {}
for character in word:
# if we haven’t seen the character yet, we need to
# initialize it in our dictionary first (this is
# essentially what defaultdict does)
if character not in counts:
counts[character] = 0
# we have seen `character` once more, so increment count
counts[character] += 1
This is already all we need to do to actually count the characters. For that example word, we would now get this as our dictionary: {'b': 4, 'c': 4, 'a': 6}
.
So now, we just need to find the n
biggest elements from there. Your zip
idea was actually quite good for this; so let’s create a list of tuples from the dict which we can then sort:
countsList = list(counts.items())
# sort the list in reverse by the second element
countsList.sort(key=lambda x: (x[1], x[0]), reverse=True)
And now we already have our final list ready, which we then can get the first n
elements from to get our result, [('a', 6), ('c', 4)]
.
In total, this is what our function looks like:
def top_words (word, n):
counts = {}
for character in word:
if character not in counts:
counts[character] = 0
counts[character] += 1
countsList = list(counts.items())
countsList.sort(key=lambda x: (x[1], x[0]), reverse=True)
return countsList[:n]
Used like this, and compared to collections.Counter
:
>>> top_words('aaaaaabbbbcccc', 2)
[('a', 6), ('c', 4)]
>>> top_words('app', 1)
[('p', 2)]
>>> Counter('aaaaaabbbbcccc').most_common(2)
[('a', 6), ('b', 4)]
>>> Counter('app').most_common(1)
[('p', 2)]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments