2个文件中的单词频率计数

用户名

我已经编写了Java代码来对发生的总数进行计数。它使用2个.txt文件作为输入，并给出单词和频率作为输出。

我也想打印，哪个文件包含一个给定的单词多少次。你有什么想法吗？

public class JavaApplication2
{

    public static void main(String[] args) throws IOException
    {     
        Path filePath1 = Paths.get("test.txt");
        Path filePath2 = Paths.get("test2.txt");

        Scanner readerL = new Scanner(filePath1);
        Scanner readerR = new Scanner(filePath2);

        String line1 = readerL.nextLine();
        String line2 = readerR.nextLine();

        String text = new String();
        text=text.concat(line1).concat(line2);

        String[] keys = text.split("[!.?:;\\s]");
        String[] uniqueKeys;
        int count = 0;
        System.out.println(text);
        uniqueKeys = getUniqueKeys(keys);

        for(String key: uniqueKeys)
        {
            if(null == key)
            {
                break;
            }           
            for(String s : keys)
            {
                if(key.equals(s))
                {
                    count++;
                }               
            }
            System.out.println("["+key+"] frequency : "+count);
            count=0;
        }
    }

    private static String[] getUniqueKeys(String[] keys)
    {
        String[] uniqueKeys = new String[keys.length];

        uniqueKeys[0] = keys[0];
        int uniqueKeyIndex = 1;
        boolean keyAlreadyExists = false;

        for(int i=1; i<keys.length ; i++)
        {
            for(int j=0; j<=uniqueKeyIndex; j++)
            {
                if(keys[i].equals(uniqueKeys[j]))
                {
                    keyAlreadyExists = true;
                }
            }           

            if(!keyAlreadyExists)
            {
                uniqueKeys[uniqueKeyIndex] = keys[i];
                uniqueKeyIndex++;               
            }
            keyAlreadyExists = false;
        }       
        return uniqueKeys;
    }

乔治

首先，不要将数组用于唯一键，而应使用HashMap<String, Integer>。效率更高。

最好的选择是分别对每个行/文件运行处理，并分别存储这些计数。然后合并两个计数以获得总频率。

更多详情：

String[] keys = text.split("[!.?:;\\s]");
HashMap<String,Integer> uniqueKeys = new HashMap<>();

for(String key : keys){
    if(uniqueKeys.containsKey(key)){
        // if your keys is already in map, increment count of it
        uniqueKeys.put(key, uniqueKeys.get(map) + 1);
    }else{
        // if it isn't in it, add it
        uniqueKeys.put(key, 1);
    }
}

// You now have the count of all unique keys in a given text
// To print them to console

for(Entry<String, Integer> keyCount : uniqueKeys.getEntrySet()){
    System.out.println(keyCount.getKey() + ": " + keyCount.getValue());
}

// To merge, if you're using Java 8

for(Entry<String, Integer> keyEntry : uniqueKeys1.getEntrySet()){
    uniqueKeys2.merge(keyEntry.getKey(), keyEntry.getValue(), Integer::add);
}

// To merge, otherwise

for(Entry<String, Integer> keyEntry : uniqueKeys1.getEntrySet()){
    if(uniqueKeys2.containsKey()){
        uniqueKeys2.put(keyEntry.getKey(),
            uniqueKeys2.get(keyEntry.getKey()) + keyEntry.getValue());
    }else{
        uniqueKeys2.put(keyEntry.getKey(), keyEntry.getValue());
    }
}

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-03-30

我来说两句

0 条评论

登录后参与评论

Mapreduce Job在python中查找单词频率计数

如何获得按第二个变量分组的单词频率计数（Python）

2个文件中的单词频率计数

2个文件中的单词频率计数

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序