BERT implementation - 2

Now, here is the extra convenient and more intellectual approach.

Method: Transformers And PyTorch

Before arriving at the second strategy, it is worth seeing that it does the identical thing as the above, but at one level more below.

  • We want to achieve our transformation to the last_hidden_state to produce the sentence embedding with this plan. For this, we work the mean pooling operation.

  • Additionally, before the mean pooling operation , we need to design last_hidden_state ; here is the code for it:

from transformers import AutoTokenizer, AutoModel import torch #nitialize our model and tokenizer: tokenizer = AutoTokenizer.from_pretrained(‘sentence-transformers/bert-base-nli-mean-tokens’) model = AutoModel.from_pretrained(‘sentence-transformers/bert-base-nli-mean-tokens’) ###Tokenize the sentences like before: sent = [ “Three years later, the coffin was still full of Jello.”, “The fish dreamed of escaping the fishbowl and into the toilet where he saw his friend go.”, “The person box was packed with jelly many dozens of months later.”, “He found a leprechaun in his walnut shell.” ] # initialize dictionary: stores tokenized sentences token = {‘input_ids’: [], ‘attention_mask’: []} for sentence in sent: # encode each sentence, append to dictionary new_token = tokenizer.encode_plus(sentence, max_length=128, truncation=True, padding=‘max_length’, return_tensors=‘pt’) token[‘input_ids’].append(new_token[‘input_ids’][0]) token[‘attention_mask’].append(new_token[‘attention_mask’][0]) # reformat list of tensors to single tensor token[‘input_ids’] = torch.stack(token[‘input_ids’]) token[‘attention_mask’] = torch.stack(token[‘attention_mask’])

#Process tokens through model: output = model(**token) output.keys()

Output: odict_keys([‘last_hidden_state’, ‘pooler_output’])

#The dense vector representations of text are contained within the outputs ‘last_hidden_state’ tensor embeddings = outputs.last_hidden_state embeddings



Output: torch.Size([4, 128, 768])

After writing our dense vector embeddings , we want to produce a mean pooling operation to form a single vector encoding , i.e., sentence embedding ).

To achieve this mean pooling operation , we will require multiplying all values in our embeddings tensor by its corresponding attention_mask value to neglect non-real tokens.

To perform this operation, we first resize our attention_mask tensor: att_mask = tokens[‘attention_mask’] att_mask.shape

output: torch.Size([4, 128])

mask = att_mask.unsqueeze(-1).expand(embeddings.size()).float() mask.shape

Output: torch.Size([4, 128, 768])

mask_embeddings = embeddings * mask mask_embeddings.shape

Output: torch.Size([4, 128, 768])

#Then we sum the remained of the embeddings along axis 1: summed = torch.sum(mask_embeddings, 1) summed.shape

Output: torch.Size([4, 768])

#Then sum the number of values that must be given attention in each position of the tensor: summed_mask = torch.clamp(mask.sum(1), min=1e-9) summed_mask.shape

Output: torch.Size([4, 768])

mean_pooled = summed / summed_mask mean_pooled

mean_ pooled |Text similarity

Once we possess our dense vectors , we can compute the cosine similarity among each — which is the likewise logic we used previously:

from sklearn.metrics.pairwise import cosine_similarity #Let’s calculate cosine similarity for sentence 0: # convert from PyTorch tensor to numpy array mean_pooled = mean_pooled.detach().numpy() # calculate cosine_similarity( [mean_pooled[0]], mean_pooled[1:] )

Output: array([[0.3308891 , 0.721926 , 0.55483633]], dtype=float32)

Index Sentence Similarity
1 “The fish dreamed of escaping the fishbowl and into the toilet where he saw his friend go.” 0.3309
2 “The person box was packed with jelly many dozens of months later.” 0.7219
3 “He found a leprechaun in his walnut shell.” 0.5548

We return around the identical results — the only distinction being that the cosine similarity for index three has slipped from 0.5547 to 0.5548 — an insignificant variation due to rounding.