Named Entity Recognition (NER) is a standard NLP problem which involves spotting named entities (people, places, organizations etc.) from a chunk of text, and classifying them into a predefined set of categories. Some of the practical applications of NER include:
- Scanning news articles for the people, organizations and locations reported.
- Providing concise features for search optimization: instead of searching the entire content, one may simply search for the major entities involved.
- Quickly retrieving geographical locations talked about in Twitter posts.
NER with spaCy
spaCy is regarded as the fastest NLP framework in Python, with single optimized functions for each of the NLP tasks it implements. Being easy to learn and use, one can easily perform simple tasks using a few lines of code.
pip install spacy python -m spacy download en_core_web_sm
Code for NER using spaCy.
import spacy nlp = spacy.load('en_core_web_sm') sentence = "Apple is looking at buying U.K. startup for $1 billion" doc = nlp(sentence) for ent in doc.ents: print(ent.text, ent.start_char, ent.end_char, ent.label_)
Apple 0 5 ORG U.K. 27 31 GPE $1 billion 44 54 MONEY