speechbrain.nnet.complex_networks.c_RNN module

Library implementing complex-valued recurrent neural networks.

Authors

Titouan Parcollet 2020

Summary

Classes:

`CLSTM`	This function implements a complex-valued LSTM.
`CLSTM_Layer`	This function implements complex-valued LSTM layer.
`CLiGRU`	This function implements a complex-valued Light GRU (liGRU).
`CLiGRU_Layer`	This function implements complex-valued Light-Gated Recurrent Unit layer.
`CRNN`	This function implements a vanilla complex-valued RNN.
`CRNN_Layer`	This function implements complex-valued recurrent layer.

Reference

class speechbrain.nnet.complex_networks.c_RNN.CLSTM(hidden_size, input_shape, num_layers=1, bias=True, dropout=0.0, bidirectional=False, return_hidden=False, init_criterion='glorot', weight_init='complex')[source]

Bases: Module

This function implements a complex-valued LSTM.

Input format is (batch, time, fea) or (batch, time, fea, channel). In the latter shape, the two last dimensions will be merged: (batch, time, fea * channel)

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). Specified value is in term of complex-valued neurons. Thus, the output is 2*hidden_size.
input_shape (tuple) – The expected shape of the input.
num_layers (int, optional) – Number of layers to employ in the RNN architecture (default 1).
bias (bool, optional) – If True, the additive bias b is adopted (default True).
dropout (float, optional) – It is the dropout factor (must be between 0 and 1) (default 0.0).
bidirectional (bool, optional) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used (default False).
return_hidden (bool, optional) – It True, the function returns the last hidden layer.
init_criterion (str , optional) – (glorot, he). This parameter controls the initialization criterion of the weights. It is combined with weights_init to build the initialization method of the complex-valued weights (default “glorot”).
weight_init (str, optional) – (complex, unitary). This parameter defines the initialization procedure of the complex-valued weights (default “complex”). “complex” will generate random complex-valued weights following the init_criterion and the complex polar form. “unitary” will normalize the weights to lie on the unit circle. More details in: “Deep Complex Networks”, Trabelsi C. et al.

Example

>>> inp_tensor = torch.rand([10, 16, 40])
>>> rnn = CLSTM(hidden_size=16, input_shape=inp_tensor.shape)
>>> out_tensor = rnn(inp_tensor)
>>>
torch.Size([10, 16, 32])

forward(x, hx=None)[source]

Returns the output of the CLSTM.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – The hidden layer.

Returns:

output (torch.Tensor) – The output tensor.
hh (torch.Tensor) – If return_hidden, the second tensor is hidden states.

class speechbrain.nnet.complex_networks.c_RNN.CLSTM_Layer(input_size, hidden_size, num_layers, batch_size, dropout=0.0, bidirectional=False, init_criterion='glorot', weight_init='complex')[source]

Bases: Module

This function implements complex-valued LSTM layer.

Parameters:

input_size (int) – Feature dimensionality of the input tensors (in term of real values).
hidden_size (int) – Number of output values (in term of real values).
num_layers (int, optional) – Number of layers to employ in the RNN architecture (default 1).
batch_size (int) – Batch size of the input tensors.
dropout (float, optional) – It is the dropout factor (must be between 0 and 1) (default 0.0).
bidirectional (bool, optional) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used (default False).
init_criterion (str, optional) – (glorot, he). This parameter controls the initialization criterion of the weights. It is combined with weights_init to build the initialization method of the complex-valued weights (default “glorot”).
weight_init (str, optional) – (complex, unitary). This parameter defines the initialization procedure of the complex-valued weights (default “complex”). “complex” will generate random complex-valued weights following the init_criterion and the complex polar form. “unitary” will normalize the weights to lie on the unit circle. More details in: “Deep Complex Networks”, Trabelsi C. et al.

forward(x: Tensor, hx: Tensor | None = None) → torch.Tensor[source]

Returns the output of the CRNN_layer.

Parameters:

x (torch.Tensor) – Linearly transformed input.
hx (torch.Tensor) – Hidden layer.

Returns:

h – The hidden states for each step.

Return type:

torch.Tensor

class speechbrain.nnet.complex_networks.c_RNN.CRNN(hidden_size, input_shape, nonlinearity='tanh', num_layers=1, bias=True, dropout=0.0, bidirectional=False, return_hidden=False, init_criterion='glorot', weight_init='complex')[source]

Bases: Module

This function implements a vanilla complex-valued RNN.

Input format is (batch, time, fea) or (batch, time, fea, channel). In the latter shape, the two last dimensions will be merged: (batch, time, fea * channel)

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). Specified value is in term of complex-valued neurons. Thus, the output is 2*hidden_size.
input_shape (tuple) – The expected shape of the input.
nonlinearity (str, optional) – Type of nonlinearity (tanh, relu) (default “tanh”).
num_layers (int, optional) – Number of layers to employ in the RNN architecture (default 1).
bias (bool, optional) – If True, the additive bias b is adopted (default True).
dropout (float, optional) – It is the dropout factor (must be between 0 and 1) (default 0.0).
bidirectional (bool, optional) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used (default False).
return_hidden (bool, optional) – It True, the function returns the last hidden layer (default False).
init_criterion (str , optional) – (glorot, he). This parameter controls the initialization criterion of the weights. It is combined with weights_init to build the initialization method of the complex-valued weights (default “glorot”).
weight_init (str, optional) – (complex, unitary). This parameter defines the initialization procedure of the complex-valued weights (default “complex”). “complex” will generate random complex-valued weights following the init_criterion and the complex polar form. “unitary” will normalize the weights to lie on the unit circle. More details in: “Deep Complex Networks”, Trabelsi C. et al.

Example

>>> inp_tensor = torch.rand([10, 16, 30])
>>> rnn = CRNN(hidden_size=16, input_shape=inp_tensor.shape)
>>> out_tensor = rnn(inp_tensor)
>>>
torch.Size([10, 16, 32])

forward(x, hx=None)[source]

Returns the output of the vanilla CRNN.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – Hidden layers.

Returns:

output (torch.Tensor) – The outputs of the CliGRU.
hh (torch.Tensor) – If return_hidden, also returns the hidden states for each step.

class speechbrain.nnet.complex_networks.c_RNN.CRNN_Layer(input_size, hidden_size, num_layers, batch_size, dropout=0.0, nonlinearity='tanh', bidirectional=False, init_criterion='glorot', weight_init='complex')[source]

Bases: Module

This function implements complex-valued recurrent layer.

Parameters:

input_size (int) – Feature dimensionality of the input tensors (in term of real values).
hidden_size (int) – Number of output values (in term of real values).
num_layers (int, optional) – Number of layers to employ in the RNN architecture (default 1).
batch_size (int) – Batch size of the input tensors.
dropout (float, optional) – It is the dropout factor (must be between 0 and 1) (default 0.0).
nonlinearity (str, optional) – Type of nonlinearity (tanh, relu) (default “tanh”).
bidirectional (bool, optional) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used (default False).
init_criterion (str , optional) – (glorot, he). This parameter controls the initialization criterion of the weights. It is combined with weights_init to build the initialization method of the complex-valued weights (default “glorot”).
weight_init (str, optional) – (complex, unitary). This parameter defines the initialization procedure of the complex-valued weights (default “complex”). “complex” will generate random complex-valued weights following the init_criterion and the complex polar form. “unitary” will normalize the weights to lie on the unit circle. More details in: “Deep Complex Networks”, Trabelsi C. et al.

forward(x: Tensor, hx: Tensor | None = None) → torch.Tensor[source]

Returns the output of the CRNN_layer.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – The hidden layer.

Returns:

h – The hidden states for each step.

Return type:

torch.Tensor

class speechbrain.nnet.complex_networks.c_RNN.CLiGRU(hidden_size, input_shape, nonlinearity='relu', normalization='batchnorm', num_layers=1, bias=True, dropout=0.0, bidirectional=False, return_hidden=False, init_criterion='glorot', weight_init='complex')[source]

Bases: Module

This function implements a complex-valued Light GRU (liGRU).

Ligru is single-gate GRU model based on batch-norm + relu activations + recurrent dropout. For more info see:

“M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, Light Gated Recurrent Units for Speech Recognition, in IEEE Transactions on Emerging Topics in Computational Intelligence, 2018” (https://arxiv.org/abs/1803.10225)

To speed it up, it is compiled with the torch just-in-time compiler (jit) right before using it.

It accepts in input tensors formatted as (batch, time, fea). In the case of 4d inputs like (batch, time, fea, channel) the tensor is flattened as (batch, time, fea*channel).

Parameters:

hidden_size (int) – Number of output neurons (i.e, the dimensionality of the output). Specified value is in term of complex-valued neurons. Thus, the output is 2*hidden_size.
input_shape (tuple) – The expected size of the input.
nonlinearity (str) – Type of nonlinearity (tanh, relu).
normalization (str) – Type of normalization for the ligru model (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in no normalization.
num_layers (int) – Number of layers to employ in the RNN architecture.
bias (bool) – If True, the additive bias b is adopted.
dropout (float) – It is the dropout factor (must be between 0 and 1).
bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.
return_hidden (bool) – If True, the function returns the last hidden layer.
init_criterion (str , optional) – (glorot, he). This parameter controls the initialization criterion of the weights. It is combined with weights_init to build the initialization method of the complex-valued weights (default “glorot”).
weight_init (str, optional) – (complex, unitary). This parameter defines the initialization procedure of the complex-valued weights (default “complex”). “complex” will generate random complex-valued weights following the init_criterion and the complex polar form. “unitary” will normalize the weights to lie on the unit circle. More details in: “Deep Complex Networks”, Trabelsi C. et al.

Example

>>> inp_tensor = torch.rand([10, 16, 30])
>>> rnn = CLiGRU(input_shape=inp_tensor.shape, hidden_size=16)
>>> out_tensor = rnn(inp_tensor)
>>>
torch.Size([4, 10, 5])

forward(x, hx=None)[source]

Returns the output of the CliGRU.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – Hidden layers.

Returns:

output (torch.Tensor) – The outputs of the CliGRU.
hh (torch.Tensor) – If return_hidden, also returns the hidden states for each step.

class speechbrain.nnet.complex_networks.c_RNN.CLiGRU_Layer(input_size, hidden_size, num_layers, batch_size, dropout=0.0, nonlinearity='relu', normalization='batchnorm', bidirectional=False, init_criterion='glorot', weight_init='complex')[source]

Bases: Module

This function implements complex-valued Light-Gated Recurrent Unit layer.

Parameters:

input_size (int) – Feature dimensionality of the input tensors.
hidden_size (int) – Number of output values.
num_layers (int) – Number of layers to employ in the RNN architecture.
batch_size (int) – Batch size of the input tensors.
dropout (float) – It is the dropout factor (must be between 0 and 1).
nonlinearity (str) – Type of nonlinearity (tanh, relu).
normalization (str) – Type of normalization (batchnorm, layernorm). Every string different from batchnorm and layernorm will result in no normalization.
bidirectional (bool) – If True, a bidirectional model that scans the sequence both right-to-left and left-to-right is used.
init_criterion (str , optional) – (glorot, he). This parameter controls the initialization criterion of the weights. It is combined with weights_init to build the initialization method of the complex-valued weights (default “glorot”).
weight_init (str, optional) – (complex, unitary). This parameter defines the initialization procedure of the complex-valued weights (default “complex”). “complex” will generate random complex-valued weights following the init_criterion and the complex polar form. “unitary” will normalize the weights to lie on the unit circle. More details in: “Deep Complex Networks”, Trabelsi C. et al.

forward(x: Tensor, hx: Tensor | None = None) → torch.Tensor[source]

Returns the output of the Complex liGRU layer.

Parameters:

x (torch.Tensor) – Input tensor.
hx (torch.Tensor) – Hidden layer.

Returns:

h – The hidden states for each step.

Return type:

torch.Tensor