Discuss spaCy in detail?

board-infinity · 29 October 2022 07:58

spaCy is one of the best text analysis library. spaCy excels at large-scale information extraction tasks and is one of the fastest in the world. It is also the best way to prepare the text for deep learning. spaCy is much faster and more accurate than NLTK Tagger and TextBlob.

How to Install?

pip install spacy
python -m spacy download en_core_web_sm

Top Features of spaCy:

Non-destructive tokenization
Named entity recognition
Support for 49+ languages
16 statistical models for 9 languages
Pre-trained word vectors
Part-of-speech tagging
Labeled dependency parsing
Syntax-driven sentence segmentation

Import and Load Library:

import spacy
  
# python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")

POS-Tagging for Reviews:

It is a method of identifying words as nouns, verbs, adjectives, adverbs, etc.

import spacy
  
# Load English tokenizer, tagger, 
# parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
  
# Process whole documents
text = ("""My name is Shaurya Uppal. 
I enjoy writing articles on GeeksforGeeks checkout
my other article by going to my profile section.""")
  
doc = nlp(text)
  
# Token and Tag
for token in doc:
  print(token, token.pos_)
  
# You want list of Verb tokens
print("Verbs:", [token.text for token in doc if token.pos_ == "VERB"])