Get most occuring characters and their counts without using counters

RAMYA Published at Dev

Ramya

For example, if I give a string 'aaaaaabbbbcccc' and n = 2, it should return [('a',6),('b',4)]. I have tried it in this way:

def top_chars(word, n):
    list1=list(word)
    list3=[]
    list2=[]
    list4=[]
    set1=set(list1)
    list2=list(set1)
    def count(item):
        count=1
        for x in word:
            if x in word:
               count+=item.count(x)
               list3.append(count)
               return count
    list2.sort(key=count)
    list3.sort()
    list4=list(zip(list2,list3))
    list4.reverse()
    list4.sort(key=lambda list4: ((-list4[1]),(list4[0])), reverse=True)
    return list4[0:n]

But for top_chars ("app", 1) it was returning the output [('p', 1)], in which the count is wrong.

poke

As you want to do this without using collections.Counter, let’s go through your code for a bit first:

list1=list(word)
set1=set(list1)
list2=list(set1)

Apart from the very non-descriptive names (which you should avoid), you can essentially compact that to characters = list(set(word)) to get a list of all characters in the word without duplicates. As word is a string, calling set() on it will automatically iterate the string character by character.

Next, in your count function you check if x in word although you are explicitely interating over the x in word. So whatever x will be, it will always be in word. So you can leave that check out. And then, when you actually increase the count, you increment it by the occurences of x—which are all characters in the original word—inside of the passed item. As you use the function as a sortkey for the list of (unique) characters of the word, you will essentially count the occurences of every character of the word in the single-character string which represents the current unique character. So with both item and x being a single character, you actually increment the count by one, if x == item—albeit in a very weird way. And next—still within the loop over the original characters—you append that count to list3 and return the count.

So what happened so far? You looked at the very first character of word and checked if item equals the character. If that’s the case, you increment count from 1 to 2, otherwise you leave it at 1. And then you just keep that count and return from the function. As such, you never actually look at the other characters of the word, which explains why your count is always 1 or 2.

If you kept the loop running, and only returned (and appended the count) afterwards then it works correctly (i.e. remove two indentation levels from those two lines). However, as you start with count = 1, you’re always one too many, so you should start at 0.

Moving on, you now sort both list2 and list3 so by the same values so they line up. By construction of the lists, this luckily works, but it still is somewhat weird. You then zip the sorted lists up and reverse the order. And then you do something which I don’t really understand: You take the list which is already reverse-sorted by the character count, and sort it reversed by the negative count. Reverse-sorting the negative count is essentially normal-sorting by the positive count; so you sort it ascending. This will give you the inverse result of what you want, and isn’t actually necessary either, as the list is already sorted after being zipped.

Anyway, there is quite a lot to improve with your code. First of all, even if you are not using a counter, you should still use a dictionary to store the counts. I assume that you may not use a defaultdict either, so we’ll just build the functionality on our own. Instead of creating a list of all the characters in the word and then counting how often that character occurs in the word, we will just loop once through the word and keep note of every character we see:

counts = {}
for character in word:
    # if we haven’t seen the character yet, we need to
    # initialize it in our dictionary first (this is
    # essentially what defaultdict does)
    if character not in counts:
        counts[character] = 0

    # we have seen `character` once more, so increment count
    counts[character] += 1

This is already all we need to do to actually count the characters. For that example word, we would now get this as our dictionary: {'b': 4, 'c': 4, 'a': 6}.

So now, we just need to find the n biggest elements from there. Your zip idea was actually quite good for this; so let’s create a list of tuples from the dict which we can then sort:

countsList = list(counts.items())

# sort the list in reverse by the second element
countsList.sort(key=lambda x: (x[1], x[0]), reverse=True)

And now we already have our final list ready, which we then can get the first n elements from to get our result, [('a', 6), ('c', 4)].

In total, this is what our function looks like:

def top_words (word, n):
    counts = {}
    for character in word:
        if character not in counts:
            counts[character] = 0
        counts[character] += 1

    countsList = list(counts.items())
    countsList.sort(key=lambda x: (x[1], x[0]), reverse=True)
    return countsList[:n]

Used like this, and compared to collections.Counter:

>>> top_words('aaaaaabbbbcccc', 2)
[('a', 6), ('c', 4)]
>>> top_words('app', 1)
[('p', 2)]

>>> Counter('aaaaaabbbbcccc').most_common(2)
[('a', 6), ('b', 4)]
>>> Counter('app').most_common(1)
[('p', 2)]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-03-12

Comments

0 comments

TOP Ranking

Article

Get most occuring characters and their counts without using counters

Get most occuring characters and their counts without using counters

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Double spacing in rmarkdown pdf

SQL Server : need add a dot before two last character

Ambiguous use of 'init' with CFStringTransform and Swift 3

Resetting Value of <input type="time"> in Firefox

Retrieve Element Tag Value XML Using Bash

How to pass data to the ng2-bs3-modal?

JWT gives JsonWebTokenError "invalid token"

How to update azerothcore-wotlk docker container

C++ 16 bit grayscale gradient image from 2D array

redirect your computer port to url

Capybara Selenium Chrome opens About Google Chrome

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

How to make thrown errors visible outside of a Promise?

JMeter: Why get error when try to save test plan

Should you provide dependent libraries in client jar?

Issue making model pop up onPress of flatlist

Message: element not interactable on accessing a tag python

Calling Doctrine clear() with an argument is deprecated