如何使Keras密集层处理3D张量作为此Softmax全连接层的输入?

拉尼亚,早上

我正在处理一个自定义问题,我必须更改完全连接的层(使用softmax进行密集处理),我的模型代码是这样的(使用Keras Framework):

.......
batch_size = 8
inputs = tf.random.uniform(shape=[batch_size,1024,256],dtype=tf.dtypes.float32)
preds = Dense(num_classes,activation='softmax')(x) #final layer with softmax activation
....
model = Model(inputs=base_model.input,outputs=preds)

因此,我必须更改密集代码层以输出具有[batch_size,1024,num_classes]形状的概率张量,而无需使用for循环,我需要对其进行优化而不是耗时的功能

我要更改的密集代码版本:

class Dense(Layer):
"""Just your regular densely-connected NN layer.

`Dense` implements the operation:
`output = activation(dot(input, kernel) + bias)`
where `activation` is the element-wise activation function
passed as the `activation` argument, `kernel` is a weights matrix
created by the layer, and `bias` is a bias vector created by the layer
(only applicable if `use_bias` is `True`).

Note: if the input to the layer has a rank greater than 2, then
it is flattened prior to the initial dot product with `kernel`.

# Example

```python
    # as first layer in a sequential model:
    model = Sequential()
    model.add(Dense(32, input_shape=(16,)))
    # now the model will take as input arrays of shape (*, 16)
    # and output arrays of shape (*, 32)

    # after the first layer, you don't need to specify
    # the size of the input anymore:
    model.add(Dense(32))
```

# Arguments
    units: Positive integer, dimensionality of the output space.
    activation: Activation function to use
        (see [activations](../activations.md)).
        If you don't specify anything, no activation is applied
        (ie. "linear" activation: `a(x) = x`).
    use_bias: Boolean, whether the layer uses a bias vector.
    kernel_initializer: Initializer for the `kernel` weights matrix
        (see [initializers](../initializers.md)).
    bias_initializer: Initializer for the bias vector
        (see [initializers](../initializers.md)).
    kernel_regularizer: Regularizer function applied to
        the `kernel` weights matrix
        (see [regularizer](../regularizers.md)).
    bias_regularizer: Regularizer function applied to the bias vector
        (see [regularizer](../regularizers.md)).
    activity_regularizer: Regularizer function applied to
        the output of the layer (its "activation").
        (see [regularizer](../regularizers.md)).
    kernel_constraint: Constraint function applied to
        the `kernel` weights matrix
        (see [constraints](../constraints.md)).
    bias_constraint: Constraint function applied to the bias vector
        (see [constraints](../constraints.md)).

# Input shape
    nD tensor with shape: `(batch_size, ..., input_dim)`.
    The most common situation would be
    a 2D input with shape `(batch_size, input_dim)`.

# Output shape
    nD tensor with shape: `(batch_size, ..., units)`.
    For instance, for a 2D input with shape `(batch_size, input_dim)`,
    the output would have shape `(batch_size, units)`.
"""

def __init__(self, units,
             activation=None,
             use_bias=True,
             kernel_initializer='glorot_uniform',
             bias_initializer='zeros',
             kernel_regularizer=None,
             bias_regularizer=None,
             activity_regularizer=None,
             kernel_constraint=None,
             bias_constraint=None,
             **kwargs):
    if 'input_shape' not in kwargs and 'input_dim' in kwargs:
        kwargs['input_shape'] = (kwargs.pop('input_dim'),)
    super(Dense, self).__init__(**kwargs)
    self.units = units
    self.activation = activations.get(activation)
    self.use_bias = use_bias
    self.kernel_initializer = initializers.get(kernel_initializer)
    self.bias_initializer = initializers.get(bias_initializer)
    self.kernel_regularizer = regularizers.get(kernel_regularizer)
    self.bias_regularizer = regularizers.get(bias_regularizer)
    self.activity_regularizer = regularizers.get(activity_regularizer)
    self.kernel_constraint = constraints.get(kernel_constraint)
    self.bias_constraint = constraints.get(bias_constraint)
    self.input_spec = InputSpec(min_ndim=2)
    self.supports_masking = True

def build(self, input_shape):
    assert len(input_shape) >= 2 
    input_dim = input_shape[-1]  

    self.kernel = self.add_weight(shape=(input_dim, self.units),
                                  initializer=self.kernel_initializer,
                                  name='kernel',
                                  regularizer=self.kernel_regularizer,
                                  constraint=self.kernel_constraint)
    if self.use_bias:
        self.bias = self.add_weight(shape=(self.units,),
                                    initializer=self.bias_initializer,
                                    name='bias',
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
    else:
        self.bias = None
    self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim})
    self.built = True

def call(self, inputs):
    output = K.dot(inputs, self.kernel)
    if self.use_bias:
        output = K.bias_add(output, self.bias)
    if self.activation is not None:
        output = self.activation(output)
    return output

def compute_output_shape(self, input_shape):
    assert input_shape and len(input_shape) >= 2
    assert input_shape[-1]
    output_shape = list(input_shape)
    output_shape[-1] = self.units
    return tuple(output_shape)

def get_config(self):
    config = {
        'units': self.units,
        'activation': activations.serialize(self.activation),
        'use_bias': self.use_bias,
        'kernel_initializer': initializers.serialize(self.kernel_initializer),
        'bias_initializer': initializers.serialize(self.bias_initializer),
        'kernel_regularizer': regularizers.serialize(self.kernel_regularizer),
        'bias_regularizer': regularizers.serialize(self.bias_regularizer),
        'activity_regularizer': regularizers.serialize(self.activity_regularizer),
        'kernel_constraint': constraints.serialize(self.kernel_constraint),
        'bias_constraint': constraints.serialize(self.bias_constraint)
    }
    base_config = super(Dense, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))
Jdehesa

(我可以想到)可以通过三种不同的方式完成此操作。如果要有一个单独的密集层,它将256个元素的向量映射到元素的向量num_classes,然后将其全部应用到整个数据批次中(即,256 x num_classes对每个样本使用相同的权重矩阵),则不要不需要做任何特殊的事情,只需使用常规Dense图层即可:

import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense

batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
layer = Dense(num_classes, activation='softmax')
out = layer(inp)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2570

另一种方法是拥有一个巨大的Dense层,该层可以同时取入所有1024 * 256值并1024 * num_classes在输出端产生所有值,即,一个具有权重矩阵的形状的层(1024 * 256) x (1024 * num_classes)(以千兆字节的内存为单位!)。这也很容易做到,尽管似乎不太可能是您所需要的:

import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import Flatten, Dense, Reshape, Softmax

batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
res = Flatten()(inp)
# This takes _a lot_ of memory!
layer = Dense(1024 * num_classes, activation=None)
out_res = layer(res)
# Apply softmax after reshaping
out_preact = Reshape((-1, num_classes))(out_res)
out = Softmax()(out_preact)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2684364800

最后,您可能需要一组1024个权重矩阵,每个权重矩阵都应用于输入中的相应样本,这暗示着具有shape的权重数组(1024, 256, num_classes)我认为无法使用标准Keras图层之一(或不知道如何)1来完成此操作,但是编写基于此的自定义图层很容易Dense

import tensorflow as tf
from tensorflow.keras.layers import Dense, InputSpec

class Dense2D(Dense):
    def __init__(self, *args, **kwargs):
        super(Dense2D, self).__init__(*args, **kwargs)

    def build(self, input_shape):
        assert len(input_shape) >= 3
        input_dim1 = input_shape[-2]
        input_dim2 = input_shape[-1]

        self.kernel = self.add_weight(shape=(input_dim1, input_dim2, self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(input_dim1, self.units),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None
        self.input_spec = InputSpec(min_ndim=3, axes={-2: input_dim1, -1: input_dim2})
        self.built = True

    def call(self, inputs):
        # Multiply each set of weights with each input element
        output = tf.einsum('...ij,ijk->...ik', inputs, self.kernel)
        if self.use_bias:
            output += self.bias
        if self.activation is not None:
            output = self.activation(output)
        return output

    def compute_output_shape(self, input_shape):
        assert input_shape and len(input_shape) >= 3
        assert input_shape[-1]
        output_shape = list(input_shape)
        output_shape[-1] = self.units
        return tuple(output_shape)

然后,您将像这样使用它:

import tensorflow as tf
from tensorflow.keras import Input

batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
layer = Dense2D(num_classes, activation='softmax')
out = layer(inp)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2631680

1:正如今天在评论中指出的那样,您实际上可以使用一个LocallyConnected1D图层来完成我尝试对Dense2D图层进行的操作。就这么简单:

import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import LocallyConnected1D

batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
layer = LocallyConnected1D(num_classes, 1, activation='softmax')
out = layer(inp)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2631680

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

向keras模型添加预处理层并设置张量值

如何理解tensorflow教程中的“密集连接层”部分

Keras-密集层与Convolution2D层的融合

如何使用Tensorflow张量设置Keras层的输入?

如何重用密集层?

将3D数据拟合为Keras顺序模型层的输入

Keras:向密集层添加一批常量输入

层conv2d_3的输入不是符号张量

3D张量上的Keras点/点层行为

Keras密集层形状错误

Keras:输入密集层的形状

keras模型类中的输入层使用numpy数组或张量作为输入给出类型错误。那么正确的类型是什么?

如何在Keras中将Lambda层作为输入层添加到现有模型中?

如何在3D张量输入中使用keras嵌入层?

Keras自定义softmax层:是否有可能基于零作为输入层中的数据在softmax层的输出中将输出神经元设置为0?

如何选择keras密集层的前k个元素?

Keras不太密集层

(2或3)维全连接层如何工作?

如何确定CNN Keras中密集层的输入大小?

Tensorflow CNN - 密集层作为 Softmax 层输入

在 tensorflow 中将 3 rank 的张量重塑为 2 rank 以在全连接层中使用它

Keras - 第一个批处理规范层显示为张量板中每个其他批处理规范层的输入,为什么会这样?

输入图像大小如何影响全连接层的大小和形状?

Keras 密集层输出形状

keras 密集层中的输入输出维度

alueError: 层序的输入 0 与 3D 自动编码器的层不兼容

如何将张量沿批次连接到 keras 层(不指定批次大小)?

如何更改 keras 模型中密集层的输出?

具有输入乘法密集层的 Keras 模型