The repository aims to give basic understandings on time-series sequence-to-sequence(Seq2Seq) model for beginners.
The repo implements the following Seq2Seq models:
- LSTM encoder-decoder
- LSTM encoder-decoder with attention by Bahdanau et al.(2014). See
architecture.rnn,architecture.attentionandarchitecture.seq2seq.AttentionLSTMSeq2Seq - Vanilla Transformer by Vaswani et al.(2017). See
architecture.Transformerandarchitecture.seq2seq.TransformerSeq2Seq.
I only implemented three types of Seq2Seq model.
You may combine 1D CNN, LSTM and Transformer layers for further-customized version.
See Tutorial.ipynb for testing.
Suppose
hidden_sizeHidden state size of LSTM encoder. Equivalent tod_modelinTransformerSeq2Seqnum_layersNumber of LSTM and Transformer encoder, decoder layers.bidirectionalWhether to use bidirectional LSTM encoder.dropoutDropout rate. Applies to:- Residual drop path in 1DCNN in
architecture.cnn - Hidden state dropout in LSTM encoder/decoder(for every time step). Unlike
torch.nn.LSTM, dropout is applied from the first LSTM layer. - The same dropout as Vanilla Transformer from Vaswani et al..
- Residual drop path in 1DCNN in
layernormLayer normalization in LSTM encoder and decoder.
All Seq2Seq models inherit architecture.seq2seq.Seq2Seq class:
class Seq2Seq(Skeleton):
def __init__(self):
super().__init__()
self.initialize_skeleton(locals())
def forward_auto(self, x: torch.Tensor, trg_len: int):
# implements autoregressive decoding
raise NotImplementedError
def forward_labeled(self, x: torch.Tensor, y: torch.Tensor):
# implements teacher-forced decoding
raise NotImplementedError
def forward(
self,
x: torch.Tensor,
trg_len: int,
y: Optional[torch.Tensor] = None,
teacher_forcing: float = -1,
):
batch_size, length_x, input_size = x.size()
p = random.uniform(0, 1)
if p > teacher_forcing and y is not None:
return self.forward_labeled(x, y)
else:
return self.forward_auto(x, trg_len)
implements autoregressive forward, which uses the model's previous time step output as current time step input in its decoder.
- Generally used in inference stage, when label output data is not available.
implements teacher-forced forward, which uses previous time step label data as current time step input in the decoder.
- Generally used in training stage, when label output data is available.
runs forward_labeled with probability of teacher_forcing.
Else generates forward_auto.
import torch
import torch.nn as nn
from architecture.seq2seq import LSTMSeq2Seq, AttentionLSTMSeq2Seq, TransformerSeq2Seq
batch_size = 32
length_x = 40 # input sequence length
length_y = 60 # output label sequence length
input_size = 27 # input feature size
output_size = 6 # output feature size
hidden_size = 128 # hidden state size of LSTM encoder. Equivalent to d_model in TransformerSeq2Seq
dropout = 0.1 # dropout rate
num_layers = 3 # number of LSTM and Transformer encoder, decoder layers
bidirectional = True # Whether to use bidirectional LSTM encoder
layernorm = True # Layer normalization in LSTM encoder and decoder
x = torch.randn(batch_size, length_x, input_size)
y = torch.randn(batch_size, length_y, output_size)
model = LSTMSeq2Seq(
input_size=input_size,
output_size=output_size,
hidden_size=hidden_size,
num_layers=num_layers,
bidirectional=bidirectional,
dropout=dropout,
)
out_1 = model.forward_auto(x, 100)
out_2 = model.forward_labeled(x, y)
print(out_1.shape, out_2.shape)
torch.Size([32, 100, 6]) torch.Size([32, 60, 6])
model = AttentionLSTMSeq2Seq(
input_size=input_size,
output_size=output_size,
hidden_size=hidden_size,
num_layers=num_layers,
bidirectional=bidirectional,
dropout=dropout,
)
out_1 = model.forward_auto(x, 100)
out_2 = model.forward_labeled(x, y)
print(out_1.shape, out_2.shape)
torch.Size([32, 100, 6]) torch.Size([32, 60, 6])
model = TransformerSeq2Seq(
input_size=input_size,
output_size=output_size,
num_layers=num_layers,
d_model=hidden_size,
n_heads=4,
dropout=dropout,
d_ff=hidden_size * 4,
)
out_1 = model.forward_auto(x, 100)
out_2 = model.forward_labeled(x, y)
print(out_1.shape, out_2.shape)
torch.Size([32, 100, 6]) torch.Size([32, 60, 6])
- Parameters can be counted by
model.count_params() - Properties are accessed using
model.model_infoattribute. - Another model instance can be created by
ModelClass(**model.model_init_args).
These features are attributed to architectures.skeleton.Skeleton class.
model.count_params()
model_info = model.model_info
model_init_args = model.model_init_args
print(model_info)
another_model_instance = TransformerSeq2Seq(**model_init_args)
Number of trainable parameters: 1,393,798
{'bidirectional': True, 'd_ff': 512, 'd_model': 128, 'dropout': 0.1, 'hidden_size': 256, 'input_size': 27, 'layernorm': False, 'n_heads': 4, 'num_hl': 0, 'num_layers': 3, 'output_size': 6}
All network parameters are random-initialized from torch.zeros and normalization layer weights with torch.ones. See architectures.init.