Advanced Sentiment Analysis Models

The advent of deep learning has provided a new standard by which to measure sentiment analysis models and has introduced many common model architectures that can be quickly prototyped and adapted to particular datasets to quickly achieve high accuracy.

Most advanced sentiment models start by transforming the input text into an embedded representation. These embeddings are sometimes trained jointly with the model, but usually additional accuracy can be attained by using pre-trained embeddings such as Word2Vec, GloVe, BERT, or FastText

Next, a deep learning model is constructed using these embeddings as the first layer inputs:

Convolutional neural networks
Surprisingly, one model that performs particularly well on sentiment analysis tasks is the convolutional neural network, which is more commonly used in computer vision models. The idea is that instead of performing convolutions on image pixels, the model can instead perform those convolutions in the embedded feature space of the words in a sentence. Since convolutions occur on adjacent words, the model can pick up on negations or n-grams that carry novel sentiment information.

LSTMs and other recurrent neural networks
RNNs are probably the most commonly used deep learning models for NLP and with good reason. Because these networks are recurrent, they are ideal for working with sequential data such as text. In sentiment analysis, they can be used to repeatedly predict the sentiment as each token in a piece of text is ingested. Once the model is fully trained, the sentiment prediction is just the model’s output after seeing all n tokens in a sentence.

RNNs can also be greatly improved by the incorporation of an attention mechanism, which is a separately trained component of the model. Attention helps a model to determine on which tokens in a sequence of text to apply its focus, thus allowing the model to consolidate more information over more timesteps.

Recursive neural networks
Although similarly named to recurrent neural nets, recursive neural networks work in a fundamentally different way. Popularized by Stanford researcher Richard Socher, these models take a tree-based representation of an input text and create a vectorized representation for each node in the tree. Typically, the sentence’s parse tree is used. As a sentence is read in, it is parsed on the fly and the model generates a sentiment prediction for each element of the tree. This gives a very interpretable result in the sense that a piece of text’s overall sentiment can be broken down by the sentiments of its constituent phrases and their relative weightings. The SPINN model from Stanford is another example of a neural network that takes this approach.

Multi-task learning
Another promising approach that has emerged recently in NLP is that of multi-task learning. Within this paradigm, a single model is trained jointly across multiple tasks with the goal of achieving state-of-the-art accuracy in as many domains as possible. The idea here is that a model’s performance on task x can be bolstered by its knowledge of related tasks y and z, along with their associated data. Being able to access a shared memory and set of weights across tasks allows for new state-of-the-art accuracies to be reached. Two popular MTL models that have achieved high performance on sentiment analysis tasks are the Dynamic Memory Network and the Neural Semantic Encoder.