当我们在 tf.keras.preprocessing.image_dataset_from_directory 对象上使用 .next() 或 .take() 时，我们是否会丢失数据？

阿敏巴

我创建了一个这样的数据生成器：

# Create test_dataset
test_dataset = \
  tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                      labels='inferred', 
                                                      label_mode='int', 
                                                      class_names=None,
                                                      seed=42, 
                                                      )
# Explore the first batch
for images, labels in test_dataset.take(1):
  print(labels)

它返回：

tf.Tensor([5 3 8 3 8 5 7 6 3 8 4 2 4 5 5 4 0 1 0 5 5 2 6 0 7 9 9 0 4 9 6 4], shape=(32,), dtype=int32)

如果我重新运行最后一部分如下：

for images, labels in test_dataset.take(1):
  print(labels)

它返回与第一次不同的东西：

tf.Tensor([0 6 2 5 5 7 5 2 7 4 0 5 0 4 6 5 8 7 7 3 5 1 1 9 5 2 6 6 6 6 2 0], shape=(32,), dtype=int32)

如果我重新创建test_dataset和探索它如下：

# Create test_dataset
test_dataset = \
  tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                      labels='inferred', 
                                                      label_mode='int', 
                                                      class_names=None,
                                                      seed=42, 
                                                      )
# Explore the first batch
for images, labels in test_dataset.take(1):
  print(labels)

它返回与第一次相同的结果：

tf.Tensor([5 3 8 3 8 5 7 6 3 8 4 2 4 5 5 4 0 1 0 5 5 2 6 0 7 9 9 0 4 9 6 4], shape=(32,), dtype=int32)

好吧，我得出的结论是，当我使用该take方法时，批处理会弹出并丢失，并且无法再用于建模和验证等。

我的问题是：

我对吗？如果我跑，第一批会丢失吗test_dataset.take(1)
如果上述问题的答案是肯定的，那么在尝试探索tf.keras.preprocessing.image_dataset_from_directory对象中的批次时，有什么方法可以不松懈吗？

弗雷特拉

这不是关于丢失批次。函数tf.keras.preprocessing.image_dataset_from_directory有一个参数shuffle，默认值为True。也就是说，数据集在每次迭代时都被打乱。

如果我们深入研究源代码：

  if shuffle:
    # Shuffle locally at each iteration
    dataset = dataset.shuffle(buffer_size=batch_size * 8, seed=seed)
  dataset = dataset.batch(batch_size)

正如您所看到的，它创建了一个tf.data具有shuffle方法的对象。Shuffle Methodreshuffle_each_iteration = True默认有一个参数。使用 2nd take 方法，您将再次迭代数据集，导致它再次被打乱。

如果shuffle = False为数据集设置，则数据将按字母数字顺序排序，并且每次迭代时其顺序都不会改变。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-09-4

我来说两句

0 条评论

登录后参与评论

上一篇：我们可以在另一个服务器的频道中检查成员权限吗？(discord.js v12)

TOP 榜单

文章

当我们在 tf.keras.preprocessing.image_dataset_from_directory 对象上使用 .next() 或 .take() 时，我们是否会丢失数据？

当我们在 tf.keras.preprocessing.image_dataset_from_directory 对象上使用 .next() 或 .take() 时，我们是否会丢失数据？

我来说两句

相关文章

TOP 榜单

隐藏发件人没有短信PHP

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

在浏览器中请求URL时会发生什么？

flask-admin 如何自定义删除按钮

材质UI垂直滑块。如何改变在垂直材料UI滑块导轨的厚度（反应）

用日期数据透视表和日期顺序查询

Jqgrid：多级别组摘要

java io ioexception无法解析服务器地址解析器的响应

Swift如何使用Base64Url编码JWT标头和有效负载之类的json对象

sshd AllowGroups组未授予访问权限

jQuery无限滚动固定div中的滚动

android 背部按下

Flexbox CSS 对齐属性环境惰性？

为什么随机森林中的平均降低基尼系数取决于人口规模？

ClickHouse 创建临时表

为什么PlusShare.Builder setRecipients方法不起作用？

如何在Android中识别MICR代码

PyQt4.QtCore模块无法向sip模块注册

正则表达式，用于查找所有以任何字母开头和数字开头的文件

是否可以通过编程方式对很多动画进行重新着色？

机器密钥生成

热门标签

归档