speechbrain.lobes.models.convolution module

This is a module to ensemble a convolution (depthwise) encoder with or without residual connection.

Authors
  • Jianyuan Zhong 2020

  • Titouan Parcollet 2023

Summary

Classes:

ConvBlock

An implementation of convolution block with 1d or 2d convolutions (depthwise).

ConvolutionFrontEnd

This is a module to ensemble a convolution (depthwise) encoder with or without residual connection.

ConvolutionalSpatialGatingUnit

This module implementing CSGU as defined in: Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding"

Reference

class speechbrain.lobes.models.convolution.ConvolutionalSpatialGatingUnit(input_size, kernel_size=31, dropout=0.0, use_linear_after_conv=False, activation=<class 'torch.nn.modules.linear.Identity'>)[source]

Bases: Module

This module implementing CSGU as defined in: Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding”

The code is heavily inspired from the original ESPNet implementation.

Parameters:
  • input_size (int) – Size of the feature (channel) dimension.

  • kernel_size (int, optional) – Size of the kernel

  • dropout (float, optional) – Dropout rate to be applied at the output

  • use_linear_after_conv (bool, optional) – If True, will apply a linear transformation of size input_size//2

  • activation (torch.class, optional) – Activation function to use on the gate, default is Identity.

Example

>>> x = torch.rand((8, 30, 10))
>>> conv = ConvolutionalSpatialGatingUnit(input_size=x.shape[-1])
>>> out = conv(x)
>>> out.shape
torch.Size([8, 30, 5])
forward(x)[source]
Parameters:

x (torch.Tensor) – Input tensor, shape (B, T, D)

Returns:

out – The processed outputs.

Return type:

torch.Tensor

class speechbrain.lobes.models.convolution.ConvolutionFrontEnd(input_shape, num_blocks=3, num_layers_per_block=5, out_channels=[128, 256, 512], kernel_sizes=[3, 3, 3], strides=[1, 2, 2], dilations=[1, 1, 1], residuals=[True, True, True], conv_module=<class 'speechbrain.nnet.CNN.Conv2d'>, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, norm=<class 'speechbrain.nnet.normalization.LayerNorm'>, dropout=0.1, conv_bias=True, padding='same', conv_init=None)[source]

Bases: Sequential

This is a module to ensemble a convolution (depthwise) encoder with or without residual connection.

Parameters:
  • input_shape (tuple) – Expected shape of the input tensor.

  • num_blocks (int) – Number of block (default 21).

  • num_layers_per_block (int) – Number of convolution layers for each block (default 5).

  • out_channels (Optional(list[int])) – Number of output channels for each of block.

  • kernel_sizes (Optional(list[int])) – Kernel size of convolution blocks.

  • strides (Optional(list[int])) – Striding factor for each block, this stride is applied at the last convolution layer at each block.

  • dilations (Optional(list[int])) – Dilation factor for each block.

  • residuals (Optional(list[bool])) – Whether apply residual connection at each block (default None).

  • conv_module (class) – Class to use for constructing conv layers.

  • activation (Callable) – Activation function for each block (default LeakyReLU).

  • norm (torch class) – Normalization to regularize the model (default BatchNorm1d).

  • dropout (float) – Dropout (default 0.1).

  • conv_bias (bool) – Whether to add a bias term to convolutional layers.

  • padding (str) – Type of padding to apply.

  • conv_init (str) – Type of initialization to use for conv layers.

Example

>>> x = torch.rand((8, 30, 10))
>>> conv = ConvolutionFrontEnd(input_shape=x.shape)
>>> out = conv(x)
>>> out.shape
torch.Size([8, 8, 3, 512])
get_filter_properties() FilterProperties[source]
class speechbrain.lobes.models.convolution.ConvBlock(num_layers, out_channels, input_shape, kernel_size=3, stride=1, dilation=1, residual=False, conv_module=<class 'speechbrain.nnet.CNN.Conv2d'>, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, norm=None, dropout=0.1, conv_bias=True, padding='same', conv_init=None)[source]

Bases: Module

An implementation of convolution block with 1d or 2d convolutions (depthwise).

Parameters:
  • num_layers (int) – Number of depthwise convolution layers for this block.

  • out_channels (int) – Number of output channels of this model (default 640).

  • input_shape (tuple) – Expected shape of the input tensor.

  • kernel_size (int) – Kernel size of convolution layers (default 3).

  • stride (int) – Striding factor for this block (default 1).

  • dilation (int) – Dilation factor.

  • residual (bool) – Add a residual connection if True.

  • conv_module (torch class) – Class to use when constructing conv layers.

  • activation (Callable) – Activation function for this block.

  • norm (torch class) – Normalization to regularize the model (default BatchNorm1d).

  • dropout (float) – Rate to zero outputs at.

  • conv_bias (bool) – Add a bias term to conv layers.

  • padding (str) – The type of padding to add.

  • conv_init (str) – Type of initialization to use for conv layers.

Example

>>> x = torch.rand((8, 30, 10))
>>> conv = ConvBlock(2, 16, input_shape=x.shape)
>>> out = conv(x)
>>> x.shape
torch.Size([8, 30, 10])
forward(x)[source]

Processes the input tensor x and returns an output tensor.

get_filter_properties() FilterProperties[source]