Limitations of Transformers

chayan-kathuria · 11 March 2022 13:36

The transformer is undoubtedly a huge improvement over the RNN based seq2seq models. But it comes with its own share of limitations:

Attention can only deal with fixed-length text strings. The text has to be split into a certain number of segments or chunks before being fed into the system as input
This chunking of text causes context fragmentation. For example, if a sentence is split from the middle, then a significant amount of context is lost. In other words, the text is split without respecting the sentence or any other semantic boundary