Limitations of Transformers

The transformer is undoubtedly a huge improvement over the RNN based seq2seq models. But it comes with its own share of limitations:

  • Attention can only deal with fixed-length text strings. The text has to be split into a certain number of segments or chunks before being fed into the system as input
  • This chunking of text causes context fragmentation. For example, if a sentence is split from the middle, then a significant amount of context is lost. In other words, the text is split without respecting the sentence or any other semantic boundary