按组顺序读取多个文件

楚门乙

我在目录中有数据如下

 IU.WRT.00.MTR.1999.081.081015.txt
 IU.WRT.00.MTS.2007.229.022240.txt
 IU.WRT.00.MTR.2007.229.022240.txt
 IU.WRT.00.MTT.1999.081.081015.txt
 IU.WRT.00.MTS.1999.081.081015.txt
 IU.WRT.00.MTT.2007.229.022240.txt

首先，我想通过使用类似的 3 个文件模式（R、S、T 不同）对数据进行分组，如下所示：

IU.WRT.00.MTR.1999.081.081015.txt
IU.WRT.00.MTS.1999.081.081015.txt
IU.WRT.00.MTT.1999.081.081015.txt

并想对其进行一些操作

然后我想读取数据

IU.WRT.00.MTT.2007.229.022240.txt
IU.WRT.00.MTS.2007.229.022240.txt
IU.WRT.00.MTR.2007.229.022240.txt

并想对其应用类似的操作。

同样，我想继续处理数百万个数据集的过程。

我尝试了示例脚本

import os
import glob
import matplotlib.pyplot as plt
from collections import defaultdict

def groupfiles(pattern):
    files = glob.glob(pattern)
    filedict = defaultdict(list)
    for file in files:
        parts = file.split(".")
        filedict[".".join([parts[5], parts[6], parts[7]])].append(file)
    for filegroup in filedict.values():
        yield filegroup
 
for relatedfiles in groupfiles('*.txt'):
    print(relatedfiles)

    for filename in relatedfiles:
        print(filename)

但是它一个一个读取文件，但每次我需要一次读取3个文件（即通过采用序列标准，首先它会读取前三个文件，然后读取接下来的三个文件等等。我希望专家可以帮助我提前致谢。

not_speshal

按多个键对文件列表进行排序。

import os
files = [f for f in os.listdir("C:/username/folder") if f.endswith(".txt")]
grouped = sorted(files, key=lambda x: (x.split(".")[4:6], x.split(".")[3]))

>>> grouped
['IU.WRT.00.MTR.1999.081.081015.txt',
 'IU.WRT.00.MTS.1999.081.081015.txt',
 'IU.WRT.00.MTT.1999.081.081015.txt',
 'IU.WRT.00.MTR.2007.229.022240.txt',
 'IU.WRT.00.MTS.2007.229.022240.txt',
 'IU.WRT.00.MTT.2007.229.022240.txt']

迭代通过使用三五成群排序列表石斑鱼从食谱itertools。

from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

for f in grouper(grouped, 3): #f is a tuple of three file names
    #your file operations here

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-09-16

我来说两句

0 条评论

登录后参与评论

上一篇：你能验证每个对象的 json 吗？

按组顺序读取多个文件

按组顺序读取多个文件

蓝屏死机没有修复解决方案

计算数据帧中每行的NA

UITableView的项目向下滚动后更改颜色，然后快速备份

Node.js中未捕获的异常错误，发生调用

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

Linux的官方Adobe Flash存储库是否已过时？

验证REST API参数

ggplot：对齐多个分面图-所有大小不同的分面

Mac OS X更新后的GRUB 2问题

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

带有错误“ where”条件的查询如何返回结果？

用日期数据透视表和日期顺序查询

VB.net将2条特定行导出到DataGridView

如何从视图一次更新多行（ASP.NET - Core）

Java Eclipse中的错误13，如何解决？

尝试反复更改屏幕上按钮的位置 - kotlin android studio

离子动态工具栏背景色

应用发明者仅从列表中选择一个随机项一次

当我尝试下载 StanfordNLP en 模型时，出现错误

python中的boto3文件上传

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID