我对从tensorflow_datasets
tensorflow中创建验证数据集感到好奇,因为我不清楚如何拆分来自的训练数据tfds
。我知道使用train_test_split
from创建验证数据很容易sklearn
,但是我不确定应该如何对from进行验证tfda
。有谁知道这样做的可能方法吗?有什么想法吗?
试图
我可以按照以下方式进行验证:
from tensorflow.keras.datasets import mnist
from sklearn.model_selection import train_test_split
(X_tr, y_tr), (X_test, y_test) = mnist.load_data()
X_train, X_val, y_train, y_val = train_test_split(X_tr, y_tr, test_size=0.1, stratify=y_tr)
但是我们应该如何从中获取验证数据:
import tensorflow_datasets as tfds
mnst= tfds.load('mnist')
train_data = mnst['train']
test_data = mnst['test']
由此我们如何制作验证数据?有什么想法吗?谢谢!
加载数据时,您可以指定拆分,如下所示:
(train_data, validation_data) = tfds.load(
'mnist',
split=['train[:80%]', 'train[80%:]'],
as_supervised=True,
)
拆分可以指定为'train'
和'test'
。从文档:
所有DatasetBuilder都公开了定义为拆分的各种数据子集(例如:训练,测试)
也可以通过一种简单的方法来检查它们:
(training_set, validation_set, test_set) = tfds.load(
'mnist',
split=['train[:80%]', 'train[80%:]', 'test'],
as_supervised=True,
)
将它们转换为numpy数组并检查其形状,将仅显示一个用于演示,其他遵循相同的逻辑,我们使用以下方法进行tfds
迭代as_numpy
:
test_set = tfds.as_numpy(test_set)
x_test = [] # will be containing numpy arrays, I defined them as a list to check.
y_test = []
for features_labels in test_set: # features_labels is a tuple
# containing features and labels here.
x_test.append(features_labels[0])
y_test.append(features_labels[1])
x_test = np.array(x_test)
y_test = np.array(y_test)
现在您可以检查形状:
x_test.shape
>>> (10000, 28, 28, 1)
y_test.shape
>>> (10000,)
x_val.shape
>>> (12000, 28, 28, 1)
x_train.shape
>>> (48000, 28, 28, 1)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句