在 LINQ 中组合符合条件的 GroupBy 和 Count

MRRock 发表于 Dev

摇滚乐

我正在努力找出一个 LINQ 语句来汇总数据。我正在通过开发一个工具来帮助我清理重复文件来学习 c#。我已经有一个字典变量，它填充了存储在 fileResult 中的文件项信息，该文件定义为Dictionary<string, List<string>>. 列表项包括 Path、FileHash 和 FolderDupFileCount（以及其他项）。

我已经成功地使用这个 LINQ 表达式来汇总所有不同的 FileHash，分配一个组 ID，并用相同的哈希计算所有的。

var fileMD5Groups = fileResult.GroupBy(x => x.Value.FileHash).Select((x, xid) => 
                    new { x.Key, count = x.Distinct().Count(), id = xid + 1 }).ToDictionary(y => y.Key, z => z);

现在，我有下面的查询有效，它计算路径中的文件数。我想弄清楚如何修改此语句以计算此路径中在其他地方有重复的文件（对于每个路径，提供此路径中重复的文件的计数）

           // Group by Path and Count the files in this path that have duplicates
           // fileResult contains a field called FileHash
            var folderDuplicateCount =
                from file in fileResult
                group file by file.Value.Path into g
                where g.Count() > 1
                select new { Path = g.Key, FolderDupFileCount = g.Count() };
            
            // Convert to dictionary
            Dictionary<string, int> dupResults = folderDuplicateCount
                                                 .ToDictionary(x => x.Path, x => x.FolderDupFileCount);

我想这对于我正在努力成为的技术人员来说很简单，因此将不胜感激。

编辑 1：以下是我正在使用的完整方法。

public static bool UpdateFileHashResults(Dictionary<string, FolderItem> folderResult, Dictionary<string, FileItem> fileResult)
        {
            var fileMD5Groups = fileResult.GroupBy(x => x.Value.FileHash).Select((x, xid) => new { x.Key, count = x.Distinct().Count(), id = xid + 1 }).ToDictionary(y => y.Key, z => z);

           // Group by Path and Count the files in this path which have the
           // same FileHash that are in other Path's
           // fileResult contains a field called FileHash
            var folderDuplicateCount =
                from file in fileResult
                group file by file.Value.Path into g
                where g.Count() > 1
                select new { Path = g.Key, FolderDupFileCount = g.Count() };
            Dictionary<string, int> dupResults = folderDuplicateCount.ToDictionary(x => x.Path, x => x.FolderDupFileCount);
            timeItLinq.Stop();
            timeItAssignValue.Restart();
            foreach (var file in fileResult.ToList())
            {
                var ik = file.Key;
                var ivMD5Hash = file.Value.FileHash;
                var fResult = fileResult[ik];
                var ivFileFolder = file.Value.Path;

                fResult.FileHashGroupID = fileMD5Groups[ivMD5Hash].id;
                fResult.FileHashCount = fileMD5Groups[ivMD5Hash].count;
                
                if (RS.FoldersFound)
                {
                    var folResult = folderResult[ivFileFolder];
                    fResult.FolderID = folResult.FolderID;
                    var dupCount = 0;
                    if (dupResults.ContainsKey(ivFileFolder))
                    {
                        dupCount = dupResults[ivFileFolder];
                    }

                    fResult.FolderDupFileCount = dupCount;
                    folResult.FolderDupFileCount = dupCount;
                }
            }
            return true;
        }

现在，var fileResult = fileListing.FindFiles(fileList)是拳头分配，使用界面：

public interface IFileListing
    {
        Dictionary<string, FileItem> FindFiles(IEnumerable<string> files);
    }

对于文件夹结果var folderResult =FolderListing.FindFolders(folderPaths);并使用下面的界面。

    public interface IFolderListing
    {
        Dictionary<string, FolderItem> FindFolders(IEnumerable<string> folders);
    }

所需结果：我试图获得按路径分组的结果，并计算此文件夹中与其他路径中的文件具有相同 FileHash 的文件数。因此，如果路径有 10 个文件，并且其中 2 个文件与另一个路径中的文件具有相同的哈希值，则 .FolderDupFileCount 的此路径的结果应为 2。

我希望这能让结果更清晰。

摇滚乐

在了解了更多关于 linq 的知识以及更多的反复试验之后，我找到了一个有效的解决方案。感谢 NetMage 提出的问题和评论帮助我思考了问题。我也按照建议更改了我的 lambda 名称，但不确定它是否完全一致。

我正在发布有效的解决方案；但是我觉得代码看起来并不那么优雅，并且在这种方法中可能有更好的方法来完成一些任务。任何精简改进此方法的建议都将有助于我养成良好的编程标准和习惯。

对于解决方案，我删除了folderDuplicateCount查询并修改了dupResults查询。因为我的fileResult字典有Path一个字段，所以我使用了这个字典而不是folderResult变量。

现在，修改dupResults提供了正确的结果。我还添加了两个额外的计算字段DupFilesHash = string.Concat(frg.Select(fvg => fvg.FileHash))和id = frgId + 1. 这些字段是在我的字典中更新字段的助手，并在特定条件下分配。DupFileHash是此路径中文件的串联哈希，在其他路径中具有重复项。然后重新散列该哈希字符串以提供代表这些重复项的唯一指纹，该指纹可用于定位/匹配别处的重复项。

我无法弄清楚的最大问题是在第一个之后.GroupBy(frg => frg.Path)，似乎我无法访问其他值字段。我遇到了一个例子，显示了frg.Select(fvg => fvg.FileHash)然后灯亮了，我学到了一些新东西。

public static bool UpdateFileHashResults(Dictionary<string, FolderItem> folderResult, Dictionary<string, FileItem> fileResult)
        {
            // List of file hashes with a count of files with identical hashes
            var fileMD5Groups = fileResult.FileItemDictionaryToList()
                .GroupBy(kvg => kvg.FileHash)
                .Select((kvg, kvgId) => new { kvg.Key, 
                    count = kvg.Distinct().Count(), 
                    id = kvgId + 1 })
                .ToDictionary(krg => krg.Key, kvg => kvg);

            // List of all folders and a count of the number of files in this folder 
            // that have the same file hash in another folder(s)
            var dupResults = fileResult.FileItemDictionaryToList()
                .Where(frg => fileMD5Groups[frg.FileHash].count > 1)
                .GroupBy(frg => frg.Path)
                .Select((frg, frgId) => new { Path = frg.Key, 
                    NumberOfFilesWithDuplicates = frg.Count(), 
                    DupFilesHash = string.Concat(frg.Select(fvg => fvg.FileHash)), 
                    id = frgId + 1})
                .ToDictionary(frg => frg.Path, fvg => fvg);
          
            // Loop over all files and back load values into folder and file results dictionaries
            timeItAssignValue.Restart();
            foreach (KeyValuePair<string, FileItem> file in fileResult.ToList())
            {
                string ik = file.Key;
                string ivMD5Hash = file.Value.FileHash;
                FileItem fResult = fileResult[ik];
                string ivFileFolder = file.Value.Path;
                int fileHashCount = fileMD5Groups[ivMD5Hash].count;

                fResult.FileHashGroupID = fileMD5Groups[ivMD5Hash].id;
                fResult.FileHashCount = fileHashCount;

                if (RS.FoldersFound)
                {
                    FolderItem folResult = folderResult[ivFileFolder];
                    fResult.FolderID = folResult.FolderID;
                    int dupCount = 0;
                    int dupID = 0;
                    string dupFilesHash = "";
                    
                    if (dupResults.ContainsKey(ivFileFolder) && fileHashCount> 1)
                    {
                        dupCount = dupResults[ivFileFolder].NumberOfFilesWithDuplicates;
                        dupID = dupResults[ivFileFolder].id;
                        dupFilesHash= dupResults[ivFileFolder].DupFilesHash;
                        dupFilesHash = HashTool.MD5StringHash(dupFilesHash);
                            
                    }

                    fResult.FolderDupFileCount = dupCount;
                    folResult.FolderDupFileCount = dupCount;
                    fResult.FolderDupFileCountID = dupID;
                    folResult.FolderDupFileCountID = dupID;
                    fResult.FolderDupFilesHash = dupFilesHash;
                }
            }
            return true;
        }

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-08-23

我来说两句

0 条评论

登录后参与评论

上一篇：无法使用 open_file 包 flutter 打开文件

TOP 榜单

文章

在 LINQ 中组合符合条件的 GroupBy 和 Count

在 LINQ 中组合符合条件的 GroupBy 和 Count

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用