Get most occuring characters and their counts without using counters

Ramya

For example, if I give a string 'aaaaaabbbbcccc' and n = 2, it should return [('a',6),('b',4)]. I have tried it in this way:

def top_chars(word, n):
    list1=list(word)
    list3=[]
    list2=[]
    list4=[]
    set1=set(list1)
    list2=list(set1)
    def count(item):
        count=1
        for x in word:
            if x in word:
               count+=item.count(x)
               list3.append(count)
               return count
    list2.sort(key=count)
    list3.sort()
    list4=list(zip(list2,list3))
    list4.reverse()
    list4.sort(key=lambda list4: ((-list4[1]),(list4[0])), reverse=True)
    return list4[0:n]

But for top_chars ("app", 1) it was returning the output [('p', 1)], in which the count is wrong.

poke

As you want to do this without using collections.Counter, let’s go through your code for a bit first:

list1=list(word)
set1=set(list1)
list2=list(set1)

Apart from the very non-descriptive names (which you should avoid), you can essentially compact that to characters = list(set(word)) to get a list of all characters in the word without duplicates. As word is a string, calling set() on it will automatically iterate the string character by character.

Next, in your count function you check if x in word although you are explicitely interating over the x in word. So whatever x will be, it will always be in word. So you can leave that check out. And then, when you actually increase the count, you increment it by the occurences of x—which are all characters in the original word—inside of the passed item. As you use the function as a sortkey for the list of (unique) characters of the word, you will essentially count the occurences of every character of the word in the single-character string which represents the current unique character. So with both item and x being a single character, you actually increment the count by one, if x == item—albeit in a very weird way. And next—still within the loop over the original characters—you append that count to list3 and return the count.

So what happened so far? You looked at the very first character of word and checked if item equals the character. If that’s the case, you increment count from 1 to 2, otherwise you leave it at 1. And then you just keep that count and return from the function. As such, you never actually look at the other characters of the word, which explains why your count is always 1 or 2.

If you kept the loop running, and only returned (and appended the count) afterwards then it works correctly (i.e. remove two indentation levels from those two lines). However, as you start with count = 1, you’re always one too many, so you should start at 0.

Moving on, you now sort both list2 and list3 so by the same values so they line up. By construction of the lists, this luckily works, but it still is somewhat weird. You then zip the sorted lists up and reverse the order. And then you do something which I don’t really understand: You take the list which is already reverse-sorted by the character count, and sort it reversed by the negative count. Reverse-sorting the negative count is essentially normal-sorting by the positive count; so you sort it ascending. This will give you the inverse result of what you want, and isn’t actually necessary either, as the list is already sorted after being zipped.

Anyway, there is quite a lot to improve with your code. First of all, even if you are not using a counter, you should still use a dictionary to store the counts. I assume that you may not use a defaultdict either, so we’ll just build the functionality on our own. Instead of creating a list of all the characters in the word and then counting how often that character occurs in the word, we will just loop once through the word and keep note of every character we see:

counts = {}
for character in word:
    # if we haven’t seen the character yet, we need to
    # initialize it in our dictionary first (this is
    # essentially what defaultdict does)
    if character not in counts:
        counts[character] = 0

    # we have seen `character` once more, so increment count
    counts[character] += 1

This is already all we need to do to actually count the characters. For that example word, we would now get this as our dictionary: {'b': 4, 'c': 4, 'a': 6}.

So now, we just need to find the n biggest elements from there. Your zip idea was actually quite good for this; so let’s create a list of tuples from the dict which we can then sort:

countsList = list(counts.items())

# sort the list in reverse by the second element
countsList.sort(key=lambda x: (x[1], x[0]), reverse=True)

And now we already have our final list ready, which we then can get the first n elements from to get our result, [('a', 6), ('c', 4)].

In total, this is what our function looks like:

def top_words (word, n):
    counts = {}
    for character in word:
        if character not in counts:
            counts[character] = 0
        counts[character] += 1

    countsList = list(counts.items())
    countsList.sort(key=lambda x: (x[1], x[0]), reverse=True)
    return countsList[:n]

Used like this, and compared to collections.Counter:

>>> top_words('aaaaaabbbbcccc', 2)
[('a', 6), ('c', 4)]
>>> top_words('app', 1)
[('p', 2)]

>>> Counter('aaaaaabbbbcccc').most_common(2)
[('a', 6), ('b', 4)]
>>> Counter('app').most_common(1)
[('p', 2)]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Get the most used words with special characters

Get counts by group using pandas

Get series of counts with conditions using ActiveRecord

Using Flink to get Counts Within a Keyed Window

PLSQL select most occuring value

Find the most occuring column in data frame

Javascript get all the most frequent characters in a string

SELECT the most occuring value PostgreSQL

Get top 3 most occuring numbers in a List

grep - How would I match a regex using only two characters, but with each character occuring the same number of times?

Get three most occuring word with their count value from vector and unordered_map

Get Counts using Linq to SQL

How to create a program that counts number of works and characters in a line in C++ WITHOUT using #include <string>

Get the sql query result without extra symbols or characters using batch?

How can I use MODE to get most occuring number with match

Most Frequent Character - User Submitted String without Dictionaries or Counters

Group on most occuring value of a field in an aggregate

Pandas get type using value counts

Get a several counts using the same state of a list

How can i get userinput in a thread without EOFError occuring in python?

How to get the Most frequent value from table without using Top, CTE, RowNum and Rank in SQL Server?

Get counters within cycles

Get counts of repeated letters using Regex

Looking to get counts of items within ArrayType column without using Explode

Count the most occuring character

How to get 3 most frequent column counts separated by year in SQL

How to get max of counts for groupby (most frequent items)

How to get record counts using EF

Print the most frequently occuring letter in a string using AWK