如何在hdf5文件中的多个组之间拆分数据？

Rohit 发表于 Dev

罗希特

我有一些数据看起来像这样：

Generated by trjconv : P/L=1/400 t=   0.00000
11214
    1P1     aP1    1  80.48  35.36   4.25
    2P1     aP1    2  37.45   3.92   3.96
11210LI     aLI11210  61.61  19.15   3.25
11211LI     aLI11211  69.99  64.64   3.17
11212LI     aLI11212  70.73  11.64   3.38
11213LI     aLI11213  62.67  16.16   3.44
11214LI     aLI11214   3.22   9.76   3.39
  61.42836  61.42836   8.47704

除了最后一行之外，我已经设法将数据写入所需的组中。我想把这一行写到一个组 /particles/box 中。如您在此处的数据文件中所见，此特定行在每一帧中重复。到目前为止，代码是以这种方式设计的，它以某种方式忽略了这一行。我尝试了一些方法，但收到以下错误：

ValueError: Shape tuple is incompatible with data

最后一行是时间相关的，即，随着每个时间帧的波动，我希望这些数据与已经在 /particles/lipids/positions/step 中定义的步骤和时间数据集相关联。这是代码：

import struct
import numpy as np
import h5py
import re

# First part generate convert the .gro -> .h5 .
csv_file = 'com'

fmtstring = '7s 8s 5s 7s 7s 7s'
fieldstruct = struct.Struct(fmtstring)
parse = fieldstruct.unpack_from

# Format for footer
fmtstring1 = '1s 1s 5s 7s 7s 7s'
fieldstruct1 = struct.Struct(fmtstring1)
parse1 = fieldstruct1.unpack_from

with open(csv_file, 'r') as f, \
    h5py.File('xaa_trial.h5', 'w') as hdf:
    # open group for position data
    ## Particles group with the attributes
    particles_grp = hdf.require_group('particles/lipids/positions')
    box_grp = particles_grp.create_group('box')
    dim_grp = box_grp.create_group('dimension')
    dim_grp.attrs['dimension'] = 3
    bound_grp = box_grp.create_group('boundary')
    bound_grp.attrs['boundary'] = ['periodic', 'periodic', 'periodic']
    edge_grp = box_grp.create_group('edges')
    edge_ds_time = edge_grp.create_dataset('time', dtype='f', shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
    edge_ds_step = edge_grp.create_dataset('step', dtype=np.uint64, shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
    edge_ds_value = None
    ## H5MD group with the attributes
    #hdf.attrs['version'] = 1.0 # global attribute
    h5md_grp = hdf.require_group('h5md/version/author/creator')
    h5md_grp.attrs['version'] = 1.0
    h5md_grp.attrs['author'] = 'rohit'
    h5md_grp.attrs['creator'] = 'known'
    
    # datasets with known sizes
    ds_time = particles_grp.create_dataset('time', dtype="f", shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
    ds_step = particles_grp.create_dataset('step', dtype=np.uint64, shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
    ds_value = None

    step = 0
    while True:
        header = f.readline()
        m = re.search("t= *(.*)$", header)
        if m:
            time = float(m.group(1))
        else:
            print("End Of File")
            break

        # get number of data rows, i.e., number of particles
        nparticles = int(f.readline())
        # read data lines and store in array
        arr = np.empty(shape=(nparticles, 3), dtype=np.float32)
        for row in range(nparticles):
            fields = parse( f.readline().encode('utf-8') )
            arr[row] = np.array((float(fields[3]), float(fields[4]), float(fields[5])))

        if nparticles > 0:
            # create a resizable dataset upon the first iteration
            if not ds_value:
                ds_value = particles_grp.create_dataset('value', dtype=np.float32,
                                                        shape=(0, nparticles, 3), maxshape=(None, nparticles, 3),
                                                        chunks=(1, nparticles, 3), compression='gzip', shuffle=True)
                #edge_data = bound_grp.create_dataset('box_size', dtype=np.float32, shape=(0, nparticles, 3), maxshape=(None, nparticles, 3), compression='gzip', shuffle=True)
            # append this sample to the datasets
            ds_time.resize(step + 1, axis=0)
            ds_step.resize(step + 1, axis=0)
            ds_value.resize(step + 1, axis=0)
            ds_time[step] = time
            ds_step[step] = step
            ds_value[step] = arr
  
        footer = parse1( f.readline().encode('utf-8') )
        dat = np.array(footer)
        print(dat)
        arr1 = np.empty(shape=(1, 3), dtype=np.float32)
        edge_data = bound_grp.create_dataset('box_size', data=dat, dtype=np.float32, compression='gzip', shuffle=True)
        
        step += 1
        #=============================================================================

kcw78

在读取和转换“页脚”行时，您的代码有一些小错误。我修改了代码并让它工作......但不确定它是否完全符合你的要求。我使用了相同的组和数据集定义。因此，将页脚数据写入此数据集：

/particles/lipids/positions/box/boundary/box_size

这来自以下组和数据集定义：

particles_grp = hdf.require_group('particles/lipids/positions')
box_grp = particles_grp.create_group('box')
bound_grp = box_grp.create_group('boundary')
edge_data = bound_grp.create_dataset('box_size'....

有几个地方需要更正：
首先，您需要更改的定义parse1以匹配 3 个字段。

# Format for footer
# FROM:
fmtstring1 = '1s 1s 5s 7s 7s 7s'
# TO:
fmtstring1 = '10s 10s 10s'

接下来，您需要修改box_size数据集的创建位置和方式。您需要像其他人一样创建它：作为可扩展数据集（maxshape=()参数）ABOVEwhile True:循环。这就是我所做的：

edge_ds_step = edge_grp.create_dataset('step', dtype=np.uint64, shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
# Create empty 'box_size' dataset here
edge_data = bound_grp.create_dataset('box_size', dtype=np.float32, shape=(0,3), maxshape=(None,3), compression='gzip', shuffle=True)

最后，这是修改后的代码：

将footer字符串解析为元组，
将元组映射到浮点数的 np.array，shape=(1,3)，
调整数据集的大小，最后

将数组加载到数据集中。

footer = parse1( f.readline().encode('utf-8') )
dat = np.array(footer).astype(float).reshape(1,3)
new_size = edge_data.shape[0]+1
edge_data.resize(new_size, axis=0)
edge_data[new_size-1:new_size,:] = dat

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-09-24

我来说两句

0 条评论

登录后参与评论

上一篇：如何将 IQueryable 转换为列表以显示返回数据？

如何在hdf5文件中的多个组之间拆分数据？

如何在hdf5文件中的多个组之间拆分数据？

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用