使用 HDF5 附加模拟数据

杰罗尼莫

我目前多次运行模拟，并希望保存这些模拟的结果，以便它们可用于可视化。

模拟运行 100 次，每个模拟生成大约 100 万个数据点（即 100 万集的 100 万个值），我现在想要有效地存储这些数据点。这些情节中的每一个的目标是在所有 100 次模拟中生成每个值的平均值。

我的main文件看起来像这样：

# Defining the test simulation environment
def test_simulation:
    environment = environment(
            periods = 1000000
            parameter_x = ...
            parameter_y = ...
      )

    # Defining the simulation
    environment.simulation()

    # Save simulation data
    hf = h5py.File('runs/simulation_runs.h5', 'a')
    hf.create_dataset('data', data=environment.value_history, compression='gzip', chunks=True)
    hf.close()

# Run the simulation 100 times
for i in range(100):
    print(f'--- Iteration {i} ---')
    test_simulation()

所述value_history内产生game()，即，在值被连续地附加到根据一个空列表：

def simulation:
    for episode in range(periods):
        value = doSomething()
        self.value_history.append(value)

现在，我在进行下一次模拟时收到以下错误消息：

ValueError: Unable to create dataset (name already exists)

我知道当前代码不断尝试创建一个新文件并生成一个错误，因为它已经存在。现在我想重新打开在第一次模拟中创建的文件，附加下一次模拟中的数据并再次保存。

kcw78

下面的例子展示了如何将所有这些想法结合在一起。它创建了 2 个文件：

maxshape()在第一个循环中使用参数创建 1 个可调整大小的数据集，然后dataset.resize()在后续循环中使用-- 输出为simulation_runs1.h5
为每个模拟创建一个唯一的数据集 - 输出为simulation_runs2.h5.

我为“模拟数据”创建了一个简单的 100x100 NumPy 随机数组，并运行了 10 次模拟。它们是变量，因此您可以增加更大的值以确定哪种方法更适合您的数据（更快）。您还可能会发现内存限制为 1M 时间段保存 1M 数据点。
注1：如果不能将所有数据保存在系统内存中，可以将仿真结果增量保存到H5文件中。只是稍微复杂一点。
注 2：我添加了一个mode变量来控制是为第一次模拟创建新文件 ( i==0) 还是以追加模式打开现有文件以进行后续模拟。

import h5py
import numpy as np

# Create some psuedo-test data
def test_simulation(i):
    periods = 100
    times = 100

    # Define the simulation with some random data
    val_hist = np.random.random(periods*times).reshape(periods,times)    
    a0, a1 = val_hist.shape[0], val_hist.shape[1]
    
    if i == 0:
        mode='w'
    else:
        mode='a'
        
    # Save simulation data (resize dataset)
    with h5py.File('runs/simulation_runs1.h5', mode) as hf:
        if 'data' not in list(hf.keys()):
            print('create new dataset')
            hf.create_dataset('data', shape=(1,a0,a1), maxshape=(None,a0,a1), data=val_hist, 
                              compression='gzip', chunks=True)
        else:
            print('resize existing dataset')
            d0 = hf['data'].shape[0]
            hf['data'].resize( (d0+1,a0,a1) )
            hf['data'][d0:d0+1,:,:] = val_hist
 
    # Save simulation data (unique datasets)
    with h5py.File('runs/simulation_runs2.h5', mode) as hf:
        hf.create_dataset(f'data_{i:03}', data=val_hist, 
                          compression='gzip', chunks=True)

# Run the simulation 100 times
for i in range(10):
    print(f'--- Iteration {i} ---')
    test_simulation(i)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-09-6

我来说两句

0 条评论

登录后参与评论

上一篇：如果没有单击链接就损坏了，请检查 MS-Access 上的超链接字段

TOP 榜单

文章

使用 HDF5 附加模拟数据

使用 HDF5 附加模拟数据

Qt Creator Windows 10 - “使用 jom 而不是 nmake”不起作用

使用next.js时出现服务器错误，错误：找不到react-redux上下文值；请确保组件包装在<Provider>中

SQL Server中的非确定性数据类型

Swift 2.1-对单个单元格使用UITableView

如何避免每次重新编译所有文件？

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

应用发明者仅从列表中选择一个随机项一次

在 Avalonia 中是否有带有柱子的 TreeView 或类似的东西？

HttpClient中的角度变化检测

在Wagtail管理员中，如何禁用图像和文档的摘要项？

如何了解DFT结果

Camunda-根据分配的组过滤任务列表

错误：找不到存根。请确保已调用spring-cloud-contract：convert

为什么此后台线程中未处理的异常不会终止我的进程？

构建类似于Jarvis的本地语言应用程序

使用分隔符将成对相邻的数组元素相互连接

您如何通过 Nativescript 中的 Fetch 发出发布请求？

通过iwd从Linux系统上的命令行连接到wifi（适用于Linux的无线守护程序）

使用React / Javascript在Wordpress API中通过ID获取选择的多个帖子/页面

使用 text() 獲取特定文本節點的 XPath