For the implementation of BERT for any task on our dataset, pre-trained weights are available and we can easily use those pre-trained weights to fine-tune the model on our own dataset. The pre-trained weights for BERT are available in the transformers library and we can use that by the following code.
from transformers import BertModel
bert = BertModel.from_pretrained(‘bert-base-uncased’)
Here, “bert” contains the pre-trained model weights for BERTBase. We also need to use the same tokenizer and tokens index mapping using which model has been pre-trained. We can get the tokenizer using the code given below.
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained(‘bert-base-uncased’)
tokens = tokenizer.tokenize(“What’s going on?”)
Output: [‘what’, “‘”, ‘s’, ‘going’, ‘on’, ‘?’]
Let’s try to fine-tune the pre-trained bert model for the sentiment classification task. The model can be designed just by adding a linear layer at the output hidden state of the [CLS] token.
import torch.nn as nn
class BERTSentiment(nn.Module):
def init(self,
bert,
output_dim):
super().init()
self.bert = bert
embedding_dim = bert.config.to_dict()['hidden_size']
self.out = nn.Linear(embedding_dim, output_dim)
def forward(self, text):
#text = [batch size, sent len]
embedded = self.bert(text)[1]
#embedded = [batch size, emb dim]
output = self.out(embedded)
#output = [batch size, out dim]
return output
OUTPUT_DIM = 2
model = BERTSentiment(bert,
OUTPUT_DIM).to(device)
We can then easily train the model using the above model by defining the loss function and optimizer.
optimizer = AdamW(model.parameters(),lr=2e-5,eps=1e-6,correct_bias=False)
criterion = nn.CrossEntropyLoss().to(device)
max_grad_norm = 1
def train(model, iterator, optimizer, criterion, scheduler):
epoch_loss = 0
epoch_acc = 0
model.train()
for batch in iterator:
optimizer.zero_grad() # clear gradients first
torch.cuda.empty_cache() # releases all unoccupied cached memory
text = batch.text
label = batch.label
predictions = model(text)
loss = criterion(predictions, label)
acc = categorical_accuracy(predictions, label)
#torch.nn.utils.clip_grad_norm_(optimizer, max_grad_norm)
loss.backward()
optimizer.step()
scheduler.step()
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
def evaluate(model, iterator, criterion):
epoch_loss = 0
epoch_acc = 0
model.eval()
with torch.no_grad():
for batch in iterator:
text = batch.text
predictions = model(text)
loss = criterion(predictions, labels)
acc = categorical_accuracy(predictions, labels)
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
We can then use train() and evaluate() function to train the model and to test.
import math
N_EPOCHS = 3
train_data_len = 25000
warmup_percent = 0.2
total_steps = math.ceil(N_EPOCHStrain_data_len1./BATCH_SIZE)
warmup_steps = int(total_steps*warmup_percent)
scheduler = get_scheduler(optimizer, warmup_steps)
for epoch in range(N_EPOCHS):
start_time = time.time()
train_loss, train_acc = train(model, train_iterator, optimizer, criterion, scheduler)
valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
end_time = time.time()
epoch_mins, epoch_secs = epoch_time(start_time, end_time)
print(f’Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s’)
print(f’tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc100:.2f}%’)
print(f’t Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc100:.2f}%’)
output