Name DB Generator: Extract Names from Text using NLTK

This python script uses NLTK to extract names from a provided text file and shows them in a searchable format along with the context the name is used in and the sentiment of the sentence.

There are two difficult problems in Computer Science and Life: Cache invalidation and Naming things. I put this script together to help me name my daughter. The results of this script when run on the Bible are viewable here: http://namedb.herokuapp.com/

There are two pieces to this Project:

Part 1: Name Extractor Python Script

The first step is to pull out and store the list of names in the text. We do this with a python script using NLTK’s Vadar library and store the extracted names into an SQLite database.

For my purposes, I used the json formatted bible found here as input text. You can ofcourse point the script to any text you wish.

Some mistakes are seen in the extract of names:

NLTK and Vadar aren’t perfect, but they still did a decent job getting a majority of the names out and it’s a decent place to start for sentiment analysis as well.

View Source for Name Extractor on Github

Part 2: Express UI

The second part is an ExpressJS UI for the database so that the names can be viewed and browsed through easily.

View Source for NameDB UI on GitHub