如何获取字符串python的连续单词数

张

我正在尝试制作一个Python脚本，该脚本接受一个字符串并给出连续单词的计数。比方说：

string = " i have no idea how to write this script. i have an idea."

output = 
['i', 'have'] 2
['have', 'no'] 1
['no', 'idea'] 1
['idea', 'how'] 1
['how', 'to'] 1
['to', 'write'] 1
...

我正在尝试使用python而不导入集合，集合中的计数器。我所拥有的如下。我正在尝试使用are.findall(#whatpatterndoiuse, string)来遍历字符串并进行比较，但是我很难弄清楚该如何做。

string2 = re.split('\s+', string. lower())
freq_dict = {} #empty dictionary
for word in word_list:
    word = punctuation.sub("", word)
    freq_dic[word] = freq_dic.get(word,0) + 1

freq_list = freq_dic.items()
freq_list.sort()
for word, freq in freq_list:
    print word, freq

使用我不想要的集合中的计数器。它还会产生一种与我上面所说的格式不同的输出。

import re
from collections import Counter
words = re.findall('\w+', open('a.txt').read())
print(Counter(zip(words,words[1:])))

尼克曼

不使用zip来解决这个问题非常简单。只需为每对单词构建元组，并在字典中跟踪其数量。仅需注意几种特殊情况-输入字符串只有一个单词时，以及您位于字符串末尾时。

试一下：

def freq(input_string):
    freq = {}
    words = input_string.split()
    if len(words) == 1:
        return freq

    for idx, word in enumerate(words):
        if idx+1 < len(words):
            word_pair = (word, words[idx+1])
            if word_pair in freq:
                freq[word_pair] += 1
            else:
                freq[word_pair] = 1

    return freq

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。