如何在使用可初始化迭代器的同时从Tensorflow中的多个tfrecords中检索示例

我有多个名为:Train_DE_01.tfrecordsthrough的tfrecord文件Train_DE_34.tfrecordsDevel_DE_01.tfrecords通过Devel_DE_14.tfrecords因此,我有一个培训和一个验证数据集。我的目的是迭代tfrecords的示例,以便从中检索2个示例Train_DE_01.tfrecords,从Train_DE_02.tfrecords...和2中检索2个示例Train_DE_34.tfrecords换句话说,当批处理大小为68时,每个tfrecord文件需要2个示例在我的代码中,我使用了initializableIterator,如下所示:

# file_name: This is a place_holder that will contain the name of the files of the tfrecords.
def load_sewa_data(file_name, batch_size):

    with tf.name_scope('sewa_tf_records'):
        dataset = tf.data.TFRecordDataset(file_name).map(_parse_sewa_example).batch(batch_size)
        iterator = dataset.make_initializable_iterator(shared_name='sewa_iterator')

        next_batch = iterator.get_next()

        names, detected, arousal, valence, liking, istalkings, images = next_batch

        print(names, detected, arousal, valence, liking, istalkings, images)

        return names, detected, arousal, valence, liking, istalkings, images, iterator

在使用sess.run()通过会话运行名称之后;我发现前68个示例是从中获取的Train_DE_01.tfrecords然后,从同一tfrecord中获取后续示例,直到使用完所有示例为止Train_DE_01.tfrecords

我尝试将Dataset api的zip()函数与可重新初始化的迭代器一起使用,如下所示:

def load_devel_sewa_tfrecords(filenames_dev, test_batch_size):

    datasets_dev_iterators = []

    with tf.name_scope('TFRecordsDevel'):
        for file_name in filenames_dev:
            dataset_dev = tf.data.TFRecordDataset(file_name).map(_parse_devel_function).batch(test_batch_size)
            datasets_dev_iterators.append(dataset_dev)

        dataset_dev_all = tf.data.Dataset.zip(tuple(datasets_dev_iterators))
        return dataset_dev_all


def load_train_sewa_tfrecords(filenames_train, train_batch_size):
    datasets_train_iterators = []

    with tf.name_scope('TFRecordsTrain'):
        for file_name in filenames_train:
            dataset_train = tf.data.TFRecordDataset(file_name).map(_parse_train_function).batch(train_batch_size)
            datasets_train_iterators.append(dataset_train)

        dataset_train_all = tf.data.Dataset.zip(tuple(datasets_train_iterators))

        return dataset_train_all


def load_sewa_dataset(filenames_train, train_batch_size, filenames_dev, test_batch_size):
    dataset_train_all = load_train_sewa_tfrecords(filenames_train, train_batch_size)
    dataset_dev_all = load_devel_sewa_tfrecords(filenames_dev, test_batch_size)

    iterator = tf.data.Iterator.from_structure(dataset_train_all.output_types,
                                               dataset_train_all.output_shapes)

    training_init_op = iterator.make_initializer(dataset_train_all)
    validation_init_op = iterator.make_initializer(dataset_dev_all)

    with tf.name_scope('inputs'):
        next_batch = iterator.get_next(name='next_batch')
        names = []
        detected = []
        arousal = []
        valence = []
        liking = []
        istalkings = []
        images = []

        # len(next_batch) is 34.
        # len(n) is 7. Since we are extracting: name, detected, arousal, valence, liking, istalking and images...
        # len(n[0 or 1 or 2 or ... or 6]) = is batch size.
        for n in next_batch:

            names.append(n[0])
            detected.append(n[1])
            arousal.append(n[2])
            valence.append(n[3])
            liking.append(n[4])
            istalkings.append(n[5])
            images.append(n[6])

        names = tf.concat(names, axis=0, name='names')
        detected = tf.concat(detected, axis=0, name='detected')
        arousal = tf.concat(arousal, axis=0, name='arousal')
        valence = tf.concat(valence, axis=0, name='valence')
        liking = tf.concat(liking, axis=0, name='liking')
        istalkings = tf.concat(istalkings, axis=0, name='istalkings')
        images = tf.concat(images, axis=0, name='images')

        return names, detected, arousal, valence, liking, istalkings, images, training_init_op, validation_init_op

现在,如果我尝试以下操作:

sess = tf.Session()
sess.run(training_init_op)
print(sess.run(names))

我收到以下错误:

ValueError: The two structures don't have the same number of elements.

这很有意义,因为训练文件的数量是34,而验证数据集的数量是14。

我想知道如何实现目标?

任何帮助深表感谢!!

这是我使用时发现的解决方法tf.cond

为了从每个例子中检索2个例子tfrecord; 我使用了apizip方法,tf.Dataset.data如下所示:

def load_train_sewa_tfrecords(filenames_train, train_batch_size):
    datasets_train_iterators = []

    with tf.name_scope('TFRecordsTrain'):
        for file_name in filenames_train:
            dataset_train = tf.data.TFRecordDataset(file_name).map(_parse_train_function).batch(train_batch_size)
            datasets_train_iterators.append(dataset_train)

        dataset_train_all = tf.data.Dataset.zip(tuple(datasets_train_iterators))
        iterator_train_all = dataset_train_all.make_initializable_iterator()

    with tf.name_scope('inputs_train'):
        next_batch = iterator_train_all.get_next(name='next_batch')

        names = []
        detected = []
        arousal = []
        valence = []
        liking = []
        istalkings = []
        images = []

        # len(next_batch) is 34.
        # len(n) is 7. Since we are extracting: name, detected, arousal, valence, liking, istalking and images...
        # len(n[0 or 1 or 2 or ... or 6]) = is batch size.
        for n in next_batch:

            names.append(n[0])
            detected.append(n[1])
            arousal.append(n[2])
            valence.append(n[3])
            liking.append(n[4])
            istalkings.append(n[5])
            images.append(n[6])

        names = tf.concat(names, axis=0, name='names')
        detected = tf.concat(detected, axis=0, name='detected')
        arousal = tf.concat(arousal, axis=0, name='arousal')
        valence = tf.concat(valence, axis=0, name='valence')
        liking = tf.concat(liking, axis=0, name='liking')
        istalkings = tf.concat(istalkings, axis=0, name='istalkings')
        images = tf.concat(images, axis=0, name='images')

        return names, detected, arousal, valence, liking, istalkings, images, iterator_train_all

我将有类似的开发方法。或者我可以将传递参数更改为方法,以便可以两次使用相同的方法...(不是问题)。

然后:

names_dev, detected_dev, arousal_dev, valence_dev, liking_dev, istalkings_dev, images_dev, iterator_dev_all = \
    load_devel_sewa_tfrecords(filenames_dev, sewa_batch_size)

names_train, detected_train, arousal_train, valence_train, liking_train, istalkings_train, images_train, iterator_train_all = \
    load_train_sewa_tfrecords(filenames_train, sewa_batch_size)

images_train = pre_process_sewa_images(images_train)
images_dev = pre_process_sewa_images(images_dev)


def return_train_sewa():
    return names_train, detected_train, arousal_train, valence_train, liking_train, istalkings_train, images_train


def return_dev_sewa():
    return names_dev, detected_dev, arousal_dev, valence_dev, liking_dev, istalkings_dev, images_dev


names, detected, arousal, valence, liking, istalkings, images_sewa = tf.cond(phase_train, return_train_sewa, return_dev_sewa)

sewa_inputs = []

sess = tf.Session()

import numpy as np
for e in range(epochs):
    sess.run(iterator_train_all.initializer)
    sess.run(iterator_dev_all.initializer)

    i = 0
    total = 0

    try:
        while True:
            i += 1
            names_np, detected_np, arousal_np, valence_np, liking_np, istalkings_np = \
                sess.run([names, detected, arousal, valence, liking, istalkings], feed_dict={phase_train: True})
            total += np.shape(names_np)[0]
            print("total =", total, " | i =", i)
    except:
        print("end of train...")

    i_d = 0
    total_d = 0

    sess.run(iterator_train_all.initializer)
    sess.run(iterator_dev_all.initializer)
    try:
        while True:
            i_d += 1
            names_np, detected_np, arousal_np, valence_np, liking_np, istalkings_np = \
                sess.run([names, detected, arousal, valence, liking, istalkings], feed_dict={phase_train: False})
            total_d += np.shape(names_np)[0]
            print("total_d =", total_d, " | i_d =", i_d)
            print(names_np)
    except:
        print("End of devel")

请注意,它是强制性的同时运行初始化sess.run(iterator_train_all.initializer)sess.run(iterator_dev_all.initializer)之前sess.run([names....])因为我猜tf.cond; 训练和验证示例都将被检索,除了,tf.cond它将仅基于phase_trainplace_holder返回其中之一,这将确定我们是处于训练还是测试模式。

证明:当我插入names = tf.Print(input_=[names], data=[names], message='dev names')下时load_devel_sewa_tfrecords归还之前;我有:

dev names[\'Devel_01\' \'Devel_01\' \'Devel_02\'...]

在控制台上打印出来,即在评估训练数据集的同时;张量流正在同时评估开发数据集; tf.cond输出的是与训练数据集相关的tfrecords。

希望这个答案有帮助!

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

如何在tf.estimator的input_fn中使用tf.data的可初始化迭代器?

如何在Tensorflow中仅初始化优化器变量?

如何在Tensorflow中动态初始化变量?

可重新初始化的迭代器,可同时进行培训和验证

C ++中的可迭代指定初始化器替代方案

如何在Swift中编写这样的初始化器?

如何在Julia中初始化求解器变量

如何在Java中初始化计时器?

在 tensorflow 中重新初始化迭代器后打乱数据集

如何在TensorFlow-Hub模块中修改可训练的tf.Variables以使用自定义内核初始化程序?

如何在Julia中初始化减少和使用累加器

使用spring时如何在Rails初始化器中设置类级别的变量?

如何使用迭代器初始化地图的键字段?

在C中的for循环中初始化迭代器变量

在Dart中的for循环之前初始化迭代器

如何在Python中重新使用初始化的类?

吨的“在替代迭代器中使用%genetic_code中的未初始化值”

如何在保持深度为1的同时初始化标签中的子模块

如何在 SvelteKit 中初始化 ApolloClient 以同时在 SSR 和客户端上工作

LSTMCell单元中的权重是如何在TensorFlow中初始化的

tensorflow:LSTM单元中变量的初始化器

在多个集合中定义时,如何在AMPL中初始化参数?

如何在TensorFlow中使用He初始化

如何在单个可迭代的python上同时具有多个迭代器?

如何在Keras中获得可复制的权重初始化?

单个类中的多个初始化器(Swift)

Tensorflow GetNext()失败,因为迭代器尚未初始化

如何在Tensorflow 2.0中通过Xavier规则进行权重初始化?

如何在Tensorflow Object Detection API中初始化卷积层的权重?