How BERT helps?

BERT, as we previously stated — is a special MVP of NLP. And a massive part of this is underneath BERTs capability to embed the essence of words inside densely bound vectors.

We call them dense vectors because each value inside the vector has a value and has a purpose for holding that value — this is in contradiction to sparse vectors . Hence, as one-hot encoded vectors where the preponderance of proceedings is 0 .

BERT is skilled at generating those dense vectors, and all encoder layer (there are numerous) outputs a collection of dense vectors.

For the BERT support, this will be a vector comprising 768 digits. Those 768 values have our mathematical representation of a particular token — which we can practice as contextual message embeddings.

Unit vector denoting each token ( product by each encoder ) is indeed watching tensor ( 768 by the number of tickets) .

We can use these tensors and convert them to generate semantic designs of the input sequence . We can next take our similarity metrics and measure the corresponding similarity linking separate lines.

The easiest and most regularly extracted tensor is the last_hidden_state tensor , conveniently yield by the BERT model .

Of course, this is a moderately large tensor — at 512×768 — and we need a vector to implement our similarity measures.

To do this, we require to turn our last_hidden_states tensor to a vector of 768 tensors.